CN114943799A

CN114943799A - Face image processing method and device and computer readable storage medium

Info

Publication number: CN114943799A
Application number: CN202110648608.5A
Authority: CN
Inventors: 陈旭; 王宇晗; 朱俊伟; 储文青; 邰颖; 汪铖杰; 李季檩; 黄飞跃
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-06-10
Filing date: 2021-06-10
Publication date: 2022-08-26

Abstract

The embodiment of the invention discloses a face image processing method, a face image processing device and a computer readable storage medium; after a face image of a source face and a face template image of a template face are obtained, feature extraction is carried out on the face image and the face template image to obtain image texture features of a source object and attribute features of an object in the face template image, then face modeling is carried out on the object in the source object and the face template image to obtain first three-dimensional modeling parameters of the source object and second three-dimensional modeling parameters of the object, the first three-dimensional modeling parameters and the second three-dimensional modeling parameters are fused to obtain target three-dimensional modeling parameters, a three-dimensional face image is constructed according to the target three-dimensional modeling parameters to obtain three-dimensional face features of the three-dimensional face image, and the object in the face template image is replaced by the source object based on the image texture features, the three-dimensional face features and the attribute features to obtain a target face image; the scheme can improve the accuracy of facial image processing.

Description

Face image processing method and device and computer readable storage medium

Technical Field

The present invention relates to the field of communications technologies, and in particular, to a method and an apparatus for processing a face image, and a computer-readable storage medium.

Background

In recent years, with the development of technology, there is a demand for replacing a face of a subject with a face of another subject in applications such as movie special effects and internet social contacts while maintaining the style of the subject in the face image, and in response to such a demand, there is a need for processing the face image.

In the process of research and practice of the prior art, the inventor of the present invention finds that a three-dimensional modeling-based method cannot process complex scenes and texture details, and the shape error of the replaced face obtained by a method of generating a countermeasure network is large, thereby greatly reducing the accuracy of face image processing.

Disclosure of Invention

The embodiment of the invention provides a face image processing method, a face image processing device and a computer readable storage medium, which can improve the accuracy of face image processing.

A facial image processing method, comprising:

obtaining a face image of a source face and a face template image of a template face, the face image including a source object;

extracting the characteristics of the face image and the face template image to obtain the image texture characteristics of the source object and the attribute characteristics of the object in the face template image;

according to the face image and the face template image, performing face modeling on the source object and an object in the face template image to obtain a first three-dimensional modeling parameter of the source object and a second three-dimensional modeling parameter of the object in the face template image, and fusing the first three-dimensional modeling parameter and the second three-dimensional modeling parameter to obtain a target three-dimensional modeling parameter;

constructing a three-dimensional face image according to the target three-dimensional modeling parameters to obtain three-dimensional face characteristics of the three-dimensional face image;

and replacing the object in the face template image with the source object based on the image texture feature, the three-dimensional face feature and the attribute feature to obtain a target face image.

Accordingly, an embodiment of the present invention provides a face image processing apparatus, including:

an acquisition unit configured to acquire a face image of a source face and a face template image of a template face, the face image including a source object;

the extracting unit is used for extracting the characteristics of the face image and the face template image to obtain the image texture characteristics of the source object and the attribute characteristics of the object in the face template image;

the fusion unit is used for carrying out face modeling on the source object and an object in the face template image according to the face image and the face template image so as to obtain a first three-dimensional modeling parameter of the source object and a second three-dimensional modeling parameter of the object in the face template image, and fusing the first three-dimensional modeling parameter and the second three-dimensional modeling parameter so as to obtain a target three-dimensional modeling parameter;

the construction unit is used for constructing a three-dimensional face image according to the target three-dimensional modeling parameters to obtain three-dimensional face features of the three-dimensional face image;

and the replacing unit is used for replacing the object in the face template image with the source object based on the image texture feature, the three-dimensional face feature and the attribute feature to obtain a target face image.

Optionally, in some embodiments, the fusion unit may be specifically configured to extract a face shape parameter corresponding to the face image from the first three-dimensional modeling parameter; extracting facial action parameters corresponding to the facial template image from the second three-dimensional modeling parameters; and fusing the facial shape parameters and the facial action parameters to obtain target three-dimensional modeling parameters.

Optionally, in some embodiments, the replacing unit may be specifically configured to splice the image texture feature and the three-dimensional facial feature to obtain a facial feature; fusing the facial features and the attribute features by adopting a trained image processing model to obtain fused rear facial features; generating a target face image based on the fused back face features, the target face image being an image in which an object in the face template image is replaced with the source object.

Optionally, in some embodiments, the replacing unit may be specifically configured to construct a face mask corresponding to the fused posterior feature, so as to obtain an initial face mask; fusing the initial face mask, the fused back face features and the attribute features to obtain target face features; adjusting the target facial features, and constructing a facial mask corresponding to the adjusted rear facial features to obtain a target facial mask; generating a target face image based on the target face mask and the adjusted back features.

Optionally, in some embodiments, the replacing unit may be specifically configured to perform feature transformation on the attribute feature to obtain a target attribute feature of an object in the face template image; determining weighting parameters of the fused back facial features and the target attribute features according to the initial face mask; and weighting the fused posterior features and the target attribute features according to the weighting parameters, and fusing the weighted posterior features and the weighted attribute features to obtain the target facial features.

Optionally, in some embodiments, the replacing unit may be specifically configured to generate an initial face image according to the adjusted back face features, and screen out an image in the target face mask from the initial face image to obtain a base face image; identifying images except the target face mask in the face template image to obtain a background image; and fusing the basic face image and the background image to obtain a target face image.

Optionally, in some embodiments, the facial image processing apparatus may further include a training unit, where the training unit may be specifically configured to obtain a facial image sample set, and screen out at least one image sample pair from the facial image sample set, where the image sample pair includes a facial image sample and a facial template image sample; replacing the object in the face template image sample with the object in the face image sample by adopting a preset image processing model to obtain a predicted face image; and converging the preset image processing model based on the image sample pair and the predicted face image to obtain a trained image processing model.

Optionally, in some embodiments, the training unit may be specifically configured to perform feature extraction on the face image sample and the face template image by using a preset image processing model, so as to obtain an image sample texture feature of the face image sample and a sample attribute feature of an object in the face template image sample; respectively identifying three-dimensional modeling parameters of the samples from the face image sample and the face template image sample, and fusing the identified three-dimensional modeling parameters of the samples to obtain three-dimensional modeling parameters of a target sample; constructing a sample three-dimensional face image according to the target sample three-dimensional modeling parameters to obtain sample three-dimensional face features of the sample three-dimensional face image, and fusing the sample three-dimensional face features, image sample texture features and sample attribute features to obtain fused sample face features; and constructing a face mask corresponding to the fused sample face features to obtain an initial sample face mask, and generating a predicted face image based on the initial sample face mask, the fused sample face features and the sample attribute features.

Optionally, in some embodiments, the training unit may be specifically configured to fuse the initial sample face mask, the fused sample face features, and the sample attribute features to obtain target sample face features; adjusting the target sample facial features, and constructing a facial mask corresponding to the adjusted sample facial features to obtain a target sample facial mask; generating a predicted face image based on the target sample face mask and the adjusted sample facial features.

Optionally, in some embodiments, the training unit may be specifically configured to generate an initial predicted face image based on the target sample facial features and an initial sample face mask; determining shape loss information of the image sample pair according to the sample three-dimensional face image, the initial predicted face image and the predicted face image; respectively calculating the face similarity of the face image sample in the image sample pair with the predicted face image and the initial predicted face image to obtain image loss information of the image sample pair; determining segmentation loss information for the image sample pair based on the initial sample face mask and a target sample face mask; determining face loss information for the image sample pair from the image sample pair, a predicted face image, and an initial predicted face image; and fusing the shape loss information, the image loss information, the segmentation loss information and the face loss information, and converging a preset image processing model based on the fused loss information to obtain a trained image processing model.

Optionally, in some embodiments, the training unit may be specifically configured to acquire first projection information of the sample three-dimensional face image, and extract first position information of a face contour from the first projection information; constructing the initial prediction face image and a target three-dimensional face image corresponding to the prediction face image, and acquiring second projection information of the target three-dimensional face image; and extracting second position information of the face contour in the initial predicted face image and third position information of the face contour in the predicted face image from the second projection information, and respectively calculating the distance between the face contours according to the first position information, the second position information and the third position information so as to obtain the shape loss information of the image sample pair.

Optionally, in some embodiments, the training unit may be specifically configured to obtain a template map mask of the face template image sample from the image samples; adjusting the size of the template picture mask to obtain an adjusted template picture mask; and respectively calculating the size difference values of the adjusted template image mask, the initial sample face mask and the target sample face mask, and fusing the size difference values to obtain the segmentation loss information of the image sample pair.

Optionally, in some embodiments, the training unit may be specifically configured to calculate similarities between the image sample pair and the predicted face image and an initial predicted face image, respectively, so as to obtain similarity loss information of the image sample pair; determining countermeasure loss information and cycle loss information of the image sample pair according to the face template image sample, the predicted face image and the initial predicted face image in the image sample pair; and using the similarity loss information, the confrontation loss information and the period loss information as face loss information of the image sample pair.

Optionally, in some embodiments, the training unit may be specifically configured to, when the object in the face image sample and the object in the face template image sample are the same object, respectively calculate spatial similarities between the face template image sample and the predicted face image and the initial predicted face image, so as to obtain spatial similarity loss information of the image sample pair; extracting image features of the face template image sample, the predicted face image and the initial predicted face image, and calculating feature similarity between the image features to obtain feature similarity loss information of the image sample pair; and taking the spatial similarity loss information and the characteristic similarity loss information as similarity loss information of the image sample pair.

In addition, an embodiment of the present invention further provides an electronic device, which includes a processor and a memory, where the memory stores an application program, and the processor is configured to run the application program in the memory to implement the facial image processing method provided in the embodiment of the present invention.

In addition, the embodiment of the present invention further provides a computer-readable storage medium, which stores a plurality of instructions, where the instructions are suitable for being loaded by a processor to execute the steps in any one of the facial image processing methods provided by the embodiment of the present invention.

After a face image of a source face and a face template image of a template face are obtained, feature extraction is carried out on the face image and the face template image to obtain image texture features of the source object and attribute features of an object in the face template image, then, face modeling is carried out on the source object and the object in the face template image according to the face image and the face template image to obtain a first three-dimensional modeling parameter of the source object and a second three-dimensional modeling parameter of the object in the face template image, a target three-dimensional modeling parameter is obtained by fusing the first three-dimensional modeling parameter and the second three-dimensional modeling parameter, a three-dimensional face image is constructed according to the target three-dimensional modeling parameter, and three-dimensional face features of the three-dimensional face image are obtained; replacing the object in the face template image with the source object based on the image texture feature, the three-dimensional face feature and the attribute feature to obtain a target face image; according to the scheme, three-dimensional modeling parameters can be identified in the face image and the face template image, the three-dimensional face image is constructed based on the three-dimensional modeling parameters, so that three-dimensional face features are obtained, the face features are geometrically constrained, the three-dimensional face features, image texture features and attribute features are fused, the result of the replaced face can be more real, and therefore the accuracy of face image processing can be improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a scene schematic diagram of an image processing method provided by an embodiment of the present invention;

FIG. 2 is a flowchart illustrating an image processing method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a target three-dimensional modeling parameter reorganization provided by an embodiment of the present invention;

FIG. 4 is a schematic diagram of facial feature stitching provided by an embodiment of the present invention;

FIG. 5 is a schematic diagram of generating a target face image provided by an embodiment of the invention;

FIG. 6 is a schematic diagram of determining shape loss information provided by an embodiment of the present invention;

FIG. 7 is an overall schematic diagram of a trained image processing model provided by an embodiment of the present invention;

FIG. 8 is a schematic diagram illustrating the comparison between the target face-changed image and the prior art face-changed image according to an embodiment of the present invention;

FIG. 9 is a schematic flowchart of another image processing method according to an embodiment of the present invention;

FIG. 10 is a schematic structural diagram of an image processing method according to an embodiment of the present invention;

FIG. 11 is a schematic structural diagram of another image processing method according to an embodiment of the present invention;

fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides a face image processing method, a face image processing device and a computer-readable storage medium. The face image processing apparatus may be integrated into an electronic device, and the electronic device may be a server or a terminal.

The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, Network service, cloud communication, middleware service, domain name service, security service, Network acceleration service (CDN), big data, an artificial intelligence platform, and the like. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

For example, referring to fig. 1, taking an example that the face image processing apparatus is integrated in an electronic device, after acquiring a face image of a source face and a face template image of a template face, the electronic device performs feature extraction on the face image and the face template image to obtain image texture features of the source object and attribute features of an object in the face template image, then performs face modeling on the source object and the object in the face template image according to the face image and the face template image to obtain first three-dimensional modeling parameters of the source object and second three-dimensional modeling parameters of the object in the face template image, fuses the first three-dimensional modeling parameters and the second three-dimensional modeling parameters to obtain target three-dimensional modeling parameters, and constructs a three-dimensional face image according to the target three-dimensional modeling parameters to obtain three-dimensional face features of the three-dimensional face image; and replacing the object in the face template image with the source object based on the image texture feature, the three-dimensional face feature and the attribute feature to obtain a target face image, so that the accuracy of face image processing is improved.

The process of face image processing may be regarded as replacing the object in the face template image with the original image, and may be understood as changing the face of the face object in the face template image, taking the face as an example, where changing the face refers to changing the identity of the face in the face template image into the person in the original image, and keeping the elements of the face in the face template image, such as the pose, the expression, the makeup, and the background, unchanged. The method can be generally applied to the scenes of movie and television production, game entertainment, e-commerce sales and the like.

It should be noted that the image processing method provided in the embodiment of the present application relates to a computer vision technology in the field of artificial intelligence, that is, in the embodiment of the present application, an object in a face template image may be replaced by a source object of a face image by using the computer vision technology of artificial intelligence, so as to obtain a target face image.

So-called Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and the like.

Computer Vision technology (CV) is a science for researching how to make a machine see, and further means that a camera and a Computer are used for replacing human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or is transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. The computer vision technology generally includes image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning and map construction, automatic driving, intelligent transportation and other technologies, and also includes common biometric identification technologies such as face recognition and fingerprint recognition.

The following are detailed below. It should be noted that the following description of the embodiments is not intended to limit the preferred order of the embodiments.

The present embodiment will be described from the perspective of a face image processing apparatus, which may be specifically integrated in an electronic device, where the electronic device may be a server or a terminal; the terminal may include a tablet Computer, a notebook Computer, a Personal Computer (PC), a wearable device, a virtual reality device, or other intelligent devices capable of performing facial image processing.

A facial image processing method, comprising:

acquiring a face image of a source face and a face template image of a template face, wherein the face image comprises a source object, performing feature extraction on the face image and the face template image to acquire image texture features of the source object and attribute features of an object in the face template image, performing face modeling on the object in the source object and the object in the face template image according to the face image and the face template image to acquire a first three-dimensional modeling parameter of the source object and a second three-dimensional modeling parameter of the object in the face template image, and fusing the first three-dimensional modeling parameter and the second three-dimensional modeling parameter to acquire a target three-dimensional modeling parameter; constructing a three-dimensional face image according to the target three-dimensional modeling parameters to obtain three-dimensional face features of the three-dimensional face image; and replacing the object in the face template image with the source object based on the image texture feature, the three-dimensional face feature and the attribute feature to obtain a target face image.

As shown in fig. 2, the specific flow of the image processing method is as follows:

101. a face image of a source face and a face template image of a template face are obtained.

The face image includes a source object, the source object may be an object included in the face image, and taking the face image as a face image as an example, the source object may be a person corresponding to the face image. The source face is a face object source providing face object replacement, and corresponds to a template face, which contains a face object to be replaced and other elements such as a face background to be maintained.

The manner of acquiring the face image and the face template image may be various, for example, the face image and the face template image may be directly acquired, or when the number of the face image and the face template image is large or the memory is large, the face image and the face template image may also be indirectly acquired, and specifically, the method may include the following steps:

(1) direct acquisition of face images and face template images

For example, the original face image uploaded by the user and the image processing information corresponding to the original face image may be directly received, and the face image of the source face and the face template image of the template face may be screened out from the original face image based on the image processing information, or a pair of face images may be acquired on an image database or a network, one face image may be arbitrarily screened out from the pair of face images as the face image of the source face, and the other face image of the pair of face images may be used as the face template image.

(2) Indirect acquisition of face images and face template images

For example, an image processing request sent by the terminal may be received, the image processing request carries a storage address of the original face image and image processing information, the original face image is obtained in a memory, a cache or a third-party database according to the storage address, and the face image of the source face and the face template image of the template face are screened out from the original face image according to the image processing information.

Optionally, after the original face image is successfully acquired, a prompt message may be sent to the terminal to prompt the terminal to successfully acquire the original face image.

Optionally, after the original face image is successfully acquired, the original face image may be preprocessed to obtain a face image and a face template image, where the preprocessing manner may be multiple, for example, the size of the original face image may be adjusted to a preset size, or a face object in the original face image may be aligned to a uniform position by using face key point registration.

102. And extracting the characteristics of the face image and the face template image to obtain the image texture characteristics of the source object and the attribute characteristics of the object in the face template image.

The image texture feature may be an identity feature on the facial image texture.

The feature extraction method may be various, and specifically includes the following steps:

for example, the feature of the face template image may be encoded by using an encoder network of the trained image processing model to obtain the attribute features of the object in the face template image, and the feature of the face image may be extracted by using a face recognition network of the trained image processing model to obtain the image texture features of the source object.

The structure of the encoder network may be various, for example, the encoder network may be a residual network formed by stacking a plurality of residual blocks (Res-blocks), and the specific number of stacks may be set according to practical applications, for example, the number of stacks may be 8 or any number, or may also be a coding Block formed by a plurality of convolutional layers and active layers.

103. According to the face image and the face template image, face modeling is carried out on the source object and the object in the face template image to obtain a first three-dimensional modeling parameter of the source object and a second three-dimensional modeling parameter of the object in the face template image, and the first three-dimensional modeling parameter and the second three-dimensional modeling parameter are fused to obtain a target three-dimensional modeling parameter.

The three-dimensional modeling parameters may be parameters for constructing a three-dimensional image (model) of the face, the three-dimensional modeling parameters include face shape parameters and face motion parameters, and the face motion parameters may include expression parameters and posture parameters. A three-dimensional model of the face may be constructed by 3DMM (three-dimensional face reconstruction model) or other three-dimensional face reconstruction model based on the three-dimensional modeling parameters.

The modes of performing face modeling on the source object and the object in the face template image and fusing three-dimensional modeling parameters can be various, and specifically, the modes can be as follows:

for example, a three-dimensional face reconstruction model may be used to perform regression on a face image and a face template image, so as to perform face modeling on a source object and an object in the face template image, obtain a first three-dimensional modeling parameter of the source object and a second three-dimensional modeling parameter of the object in the face template image from the constructed face model, extract a face shape parameter corresponding to the face image from the first three-dimensional modeling parameter, extract a face action parameter corresponding to the face template image from the second three-dimensional modeling parameter, and fuse the face shape parameter and the face action parameter to obtain a target three-dimensional modeling parameter.

For example, the first three-dimensional modeling parameter may be directly extracted from the face model of the source object, and the second three-dimensional modeling parameter may be extracted from the face model of the object in the face template image, or the face model may be converted into the three-dimensional modeling parameters, so as to obtain the first three-dimensional modeling parameter and the second three-dimensional modeling parameter.

After the first three-dimensional modeling parameter and the second three-dimensional modeling parameter are obtained, the first three-dimensional modeling parameter and the second three-dimensional modeling parameter can be fused, the fusion process can be actually regarded as recombining the three-dimensional modeling parameters, facial shape parameters are extracted from the first three-dimensional modeling parameter of the facial image, facial motion parameters are extracted from the second three-dimensional modeling parameter of the facial template image, the facial shape parameters and the facial motion parameters are fused, the fusion process can be various, for example, splicing and combining can be directly carried out, or weighting parameters and basic modeling parameters of the facial shape parameters and the facial motion parameters can be obtained, the facial shape parameters and the facial motion parameters are weighted according to the weighting parameters, and the weighted facial shape parameters and the weighted facial motion parameters are fused with the basic modeling parameters, thereby obtaining target three-dimensional modeling parameters, which can be specifically shown as formula (1):

wherein S is a target three-dimensional modeling parameter, alpha and beta are weighting parameters,

the parameters are modeled for the basis.

The target three-dimensional modeling parameters may be regarded as recombining the face shape of the source object in the face image and the expression and posture of the object in the face template image, and specifically, as shown in fig. 3, the recombined target three-dimensional modeling parameters give consideration to both the face shape of the source object in the face image and the expression and posture of the object in the face template image in the geometric features, so that the similarity of the face shapes can be improved.

104. And constructing a three-dimensional face image according to the target three-dimensional modeling parameters to obtain three-dimensional face features of the three-dimensional face image.

Wherein the three-dimensional face image may be a face image of a three-dimensional model constructed based on the target three-dimensional modeling parameters.

Among them, the three-dimensional facial features may be shape features of a face image contained in the three-dimensional face image and motion (expression and posture) features of the face template image.

The three-dimensional face image may be constructed in various ways, which may specifically be as follows:

for example, a three-dimensional face model corresponding to the target three-dimensional modeling parameter may be constructed by a three-dimensional face reconstruction model to obtain a three-dimensional face image, and three-dimensional features of the three-dimensional face model may be obtained, or a three-dimensional face model corresponding to the target three-dimensional modeling parameter may be constructed by a three-dimensional face reconstruction model, and the three-dimensional face model may be adjusted, for example, local fitting or optimization may be performed to obtain a three-dimensional face image, and three-dimensional features of the adjusted three-dimensional face model may be obtained as three-dimensional face features of the three-dimensional face image.

105. And replacing the object in the face template image with the source object based on the image texture feature, the three-dimensional face feature and the attribute feature to obtain a target face image.

For example, the image texture features and the three-dimensional face features may be spliced to obtain face features, the trained image processing model is used to fuse the face features and the attribute features to obtain fused back face features, and a target face image is generated based on the fused back face features, where the target face image is an image obtained by replacing an object in the face template image with a source object, and specifically, the method may be as follows:

and S1, splicing the image texture features and the three-dimensional face features to obtain the face features.

For example, the image texture features and the three-dimensional facial features may be directly stitched to obtain the facial features, or weighting parameters of the image texture features and the three-dimensional facial features may be obtained, the image texture features and the three-dimensional facial features may be weighted according to the weighting parameters, and the weighted image texture features and the three-dimensional facial features may be fused to obtain the facial features, or feature depths of the image texture features and the three-dimensional facial features may also be obtained, the feature depths may be adjusted to a uniform target feature depth, and the image texture features and the three-dimensional facial features may be stitched based on the target feature depth to obtain the facial features.

The face features obtained by stitching may be identity features sensitive to the face shape (face shape), mainly because geometric constraints of the face shape are added to the three-dimensional face features, as shown in fig. 4, and the constraints of the face shape in the three-dimensional face features are constraints of the face shape in the face image of the source face.

And S2, fusing the facial features and the attribute features by adopting the trained image processing model to obtain fused posterior features.

For example, a decoder network of the trained image processing model may be used to decode the facial features and the attribute features, and the decoded facial features and the attribute features are fused to obtain fused posterior features.

The structure of the decoder network may be various, for example, the decoder network may be a residual network stacked by multiple Res-blocks, or may be a network stacked by multiple Res-blocks including an AdaIN (style migration) layer, or may also be a decoding Block composed of multiple convolutional layers and active layers.

The number of Res-blocks or the stack number of Res-blocks including AdaIN layers may be set according to practical applications, and may be, for example, 5 or any number.

And S3, generating the target face image based on the fused back face features.

For example, a face mask corresponding to the fused back face features may be constructed to obtain an initial face mask, the fused back face features, and the attribute features are fused to obtain target face features, the target face features are adjusted, a face mask corresponding to the adjusted back face features is constructed to obtain a target face mask, and a target face image is generated based on the target face mask and the adjusted back face features, which may specifically be as follows:

(1) and constructing a face mask corresponding to the fused rear face features to obtain an initial face mask.

For example, a semantic segmentation network may be used to extract features of the fused posterior features, identify segmented regions in the face image or face template image based on the extracted semantic features, segment the face image or face template image based on the segmented regions, and perform occlusion processing on the segmented face image to obtain an initial face mask, or the semantic segmentation network may be used to extract features of the fused posterior features and directly generate an initial face mask based on the extracted semantic features.

(2) And fusing the initial face mask, the fused back face features and the attribute features to obtain the target face features.

For example, the attribute features may be subjected to feature transformation to obtain target attribute features of an object in a face template image, weighting parameters of the fusion posterior feature and the target attribute features are determined according to the initial face mask, the fusion posterior feature and the target attribute features are weighted according to the weighting parameters, and the weighting posterior feature and the weighted attribute features are fused to obtain the target facial features.

There are various ways of performing feature transformation on the attribute features, for example, one or more Res-blocks may be used to perform feature transformation on the attribute features to obtain target attribute features of an object in the face template image.

After feature conversion is performed on the attribute features, weighting parameters may be determined in various manners, for example, a region other than the initial face mask is screened out from the face image or the face template image to obtain a target image region, and the weighting parameters are determined according to the initial face mask and the target image region.

The weighted posterior features and the weighted attribute features are fused, and the fusion process can be various, for example, the weighted posterior features and the weighted attribute features can be directly added to obtain the target facial features, which can be specifically shown in formula (2):

z _fuse ＝M _low ⊙z _dec +(1-M _low )⊙σ(z _enc ) (2)

wherein M is _low For initial facial masking, z _dec To blend posterior features, z _enc For attribute features, σ is a Res-Block structure.

(3) And adjusting the target facial features, and constructing and adjusting a facial mask corresponding to the rear facial features to obtain the target facial mask.

For example, the target facial features may be adjusted by using an upsampling structure to obtain adjusted posterior features, and a face mask corresponding to the adjusted posterior features may be constructed by using a semantic segmentation network to obtain a target face mask.

For example, the size of the target facial feature can be enlarged to a preset size by adopting one or more up-sampling structures composed of Res-Block, and the adjusted rear facial feature is obtained, so that the resolution of the target facial feature can be improved.

After the target facial features are adjusted, a facial mask corresponding to the adjusted rear facial features can be constructed in various ways, for example, a semantic segmentation network can be used for extracting the features of the adjusted facial features, segmentation areas are identified in the facial image or the facial template image based on the extracted semantic features, the facial image or the facial template image is segmented based on the segmentation areas, and the segmented facial image is shielded to obtain the target facial mask, or the semantic segmentation network can be used for extracting the features of the adjusted facial features and directly generating the target facial mask based on the extracted semantic features.

(4) A target face image is generated based on the target face mask and the adjusted back features.

For example, an initial face image may be generated from the adjusted back face features, an image in a target face mask may be screened from the initial face image to obtain a base face image, an image other than the target face mask may be identified from the face template image to obtain a background image, and the base face image and the background image may be fused to obtain the target face image.

The process of fusing the basic face image and the background image may be various, for example, the basic face image and the background image may be spliced to obtain the target face image, and specifically, the process may be as shown in formula (3):

I _r ＝M _r ⊙I _out +(1-M _r )⊙I _t (3)

wherein, I _r For the target face image, M _r For masking the target face，I _out For an initial face image, I _t Is a face template image.

Optionally, the sizes of the basic face image and the background image may be adjusted, and the size-adjusted basic face image and the size-adjusted background image are superimposed to obtain the target face image.

As shown in fig. 5, the process of generating the target face image based on the fused back face features may be that semantic fusion is performed on the fused back face features, the shallow coding features (attribute features), and the initial face masks corresponding to the fused back face features, which are output by the decoder network, by a face semantic fusion module in the trained image processing model, so that the target face features are subjected to size adjustment of up-sampling on the target face features, a target face mask is generated based on the adjusted back face features, and the target face image may be generated based on the target face mask and the adjusted back face features.

The trained image processing model may be set according to requirements of practical applications, and in addition, it should be noted that the trained image processing model may be set in advance by a maintenance person, or may be trained by an image processing apparatus, that is, before the step "fusing the facial features and the attribute features by using the trained image processing model to obtain fused posterior features", the image processing method may further include:

the method comprises the steps of obtaining a face image sample set, screening out at least one image sample pair from the face image sample set, wherein the image sample pair comprises a face image sample and a face template image sample, replacing an object in the face template image sample with an object in the face image sample by adopting a preset image processing model to obtain a predicted face image, converging the preset image processing model based on the image sample pair and the predicted face image to obtain a trained image processing model, and specifically comprising the following steps:

(1) and acquiring a facial image sample set, and screening at least one image sample pair in the facial image sample set.

Wherein the image sample pair comprises a face image sample and a face template image sample.

The manner of acquiring the face image sample set may be various, and specifically may be as follows:

for example, a plurality of original face image samples may be acquired, the original face image samples are preprocessed to obtain a face image sample set, two face image samples are arbitrarily screened out from the face image sample set, and any one image in the face image samples is designated as a face image sample, and then another image sample may be a face template image sample, so as to obtain an image sample pair.

For example, the face in the original highlight sample may be aligned to a uniform position by using facial key point registration, and the size of the aligned original image sample is adjusted to a preset size, so as to obtain an image sample set, where the preset size may be set according to an actual application, for example, may be 256 × 256 or other sizes.

(2) And replacing the object in the face template image sample with the object in the face image sample by adopting a preset image processing model to obtain a predicted face image.

For example, a preset image processing model may be used to perform feature extraction on a face image sample and a face template image to obtain image sample texture features of the face image sample and sample attribute features of an object in the face template image sample, respectively identify sample three-dimensional modeling parameters in the face image sample and the face template image sample, fuse the identified sample three-dimensional modeling parameters to obtain target sample three-dimensional detection parameters, construct a sample three-dimensional face image according to the target sample three-dimensional modeling parameters to obtain template three-dimensional face features of the sample three-dimensional face image, fuse the sample three-dimensional face features, the image sample texture features and the sample attribute features to obtain fused sample face features, construct a face mask corresponding to the fused sample face features to obtain an initial sample face mask, and perform face mask extraction based on the initial sample face mask, And fusing the sample facial features and the sample attribute features to generate a predicted facial image.

The steps of extracting features of the facial image sample and the facial template image, identifying three-dimensional modeling parameters of the sample, fusing three-dimensional facial features of the sample, texture features of the image sample and attribute features of the sample, constructing a facial mask corresponding to the fused facial features of the sample, and the like can be referred to above, and are not repeated herein.

The method for generating the predicted face image may be various, for example, the initial sample face mask, the fused sample face features, and the sample attribute features may be fused to obtain the target sample face features, the target sample face features may be adjusted, the face mask corresponding to the adjusted sample face features may be constructed to obtain the target sample face mask, and the predicted face image may be generated based on the target sample face mask and the adjusted sample face features, which may be referred to above specifically, and thus, the description is omitted here.

(3) And converging the preset image processing model based on the image sample pair and the predicted face image to obtain the trained image processing model.

For example, an initial predicted face image may be generated based on the target sample face features and the initial sample face mask, shape loss information of the image sample pair may be determined from the sample three-dimensional face image, the initial predicted face image, and the predicted face image, face similarities of the face image sample of the image sample pair with the predicted face image and the initial predicted face image may be calculated, respectively, to obtain image loss information of the image sample pair, segmentation loss information of the image sample pair may be determined based on the initial sample face mask and the target sample face mask, face loss information of the image sample pair may be determined from the image sample pair, the predicted face image, and the initial predicted face image, the shape loss information, the image loss information, the segmentation loss information, and the face loss information may be fused, and the preset image processing model may be converged based on the fused loss information, obtaining a trained image processing model, which may specifically be as follows:

c1, generating an initial prediction face image based on the target sample face features and the initial sample face mask.

For example, an initial face sample image may be generated according to the target sample face features, an image in the initial sample face mask may be screened out from the candidate face images to obtain a basic face sample image, an image other than the initial sample face mask may be screened out from the face template image sample to obtain a background sample image, and the background sample image and the basic face sample image are fused to obtain an initial predicted face image.

C2, determining the shape loss information of the image sample pair according to the sample three-dimensional face image, the initial predicted face image and the predicted face image.

For example, first projection information of a sample three-dimensional face image may be acquired, first position information of a face contour may be extracted from the first projection information, an initial predicted face image and a target three-dimensional face image corresponding to the predicted face image may be constructed, second projection information of the target three-dimensional face image may be acquired, second position information of the face contour in the initial predicted face image and third position information of the face contour in the predicted face image may be extracted from the second projection information, and a distance between the face contours may be calculated based on the first position information, the second position information, and the third position information, respectively, to obtain shape loss information of the image sample pair.

The manner of obtaining the first projection information of the sample three-dimensional face image may be various, for example, a three-dimensional renderer may be used to obtain a 2D projection of the sample three-dimensional face image under a preset angle coefficient to obtain the first projection information, where the three-dimensional renderer may be various, for example, a pytorch3D (a three-dimensional renderer) or other three-dimensional renderers.

After the first projection information is acquired, the first position information of the face contour may be extracted from the first projection information, and there may be various manners of extraction, for example, positions of a plurality of contour points of the face contour may be acquired in the 2D projection, so as to obtain the first position information of the face contour, and the number of the contour points may be 18 or other numbers.

For example, the initial predicted face image and the target three-dimensional face image corresponding to the predicted face image may be reconstructed by using a face three-dimensional reconstruction model, and the reconstruction method may be referred to above, and thus, the details are not repeated here.

After the target three-dimensional face image is constructed, second projection information of the target three-dimensional face image can be obtained, second position information of a face contour in the initial predicted face image is extracted from the second projection information, and third position information of the face contour in the predicted face image is extracted from the third projection information.

After the second position information and the third position information of the rear face contour are extracted, the distance between the face contours may be calculated to obtain the shape loss information of the image sample pair, and the calculation may be performed in various manners, for example, the distance between the face contours may be calculated according to the first position information, the second position information, and the third position information, for example, the position difference between contour points of the same face contour in the first position information and the second position information may be calculated, the position difference between contour points of the same face contour in the first position information and the third position information may be calculated, and then the average value of the position differences may be calculated to obtain the shape loss information of the image sample pair, which may be specifically shown in equation (4):

wherein L is _shape N is the number of contour points of the face contour,

is the position of the contour point of the face contour in the first position information,

the positions of the same contour point in the second position information,

the positions of the same contour point in the third position information.

The shape loss information of the image sample pair may be regarded as a distance between the three-dimensional face image and a face contour corresponding to the initial predicted face and a three-dimensional face model corresponding to the predicted face image, and may be specifically shown in fig. 6.

And C3, respectively calculating the face similarity of the face image sample in the image sample pair with the predicted face image and the initial predicted face image to obtain the image loss information of the image sample pair.

For example, a pre-trained face recognition feature extractor may be used to extract facial features in a face image sample, a predicted face image, and an initial predicted face image, and calculate a cosine similarity between the face image sample and the facial features of the predicted face image, so as to obtain a first face similarity between the face image sample and the predicted face image, which may be specifically shown in formula (5):

L _id1 ＝1-cos(v _id (I _s ),v _id (I _r )) (5)

wherein L is _id1 Is the first facial similarity, v _id For pre-trained face recognition feature extractors, I _s As facial image samples, I _r To predict a face image.

And calculating the similarity of the second face between the face image sample and the initial predicted face image, wherein the specific calculation process is referred to above and is not repeated one by one. Then, the similarity of the first face and the similarity of the second face are fused to obtain image loss information, and the fusion mode may be multiple, for example, the similarity of the first face and the similarity of the second face may be directly added to obtain image loss information, which may be specifically shown in formula (6):

L _id ＝(1-cos(v _id (I _s ),v _id (I _r )))+(1-cos(v _id (I _s ),v _id (I _low )))(6)

wherein L is _id For image loss information, v _id Facial recognition feature extractor for pre-training, I _s As facial image samples, I _r To predict facial images, I _low Is the initial predicted image.

Optionally, the fusion mode may also be to obtain a weighting parameter of the similarity between the first face and the second face, weight the similarity between the first face and the second face according to the weighting parameter, and fuse the weighted similarity between the first face and the second face to obtain the image loss information.

And C4, determining the segmentation loss information of the image sample pair based on the initial sample face mask and the target sample face mask.

For example, a template image mask of a face template image sample in an image sample pair is obtained, the size of the template image mask is adjusted to obtain an adjusted template image mask, size differences between the adjusted template image mask and the face masks of the initial sample and the target sample are respectively calculated, and the size differences are fused to obtain segmentation loss information of the image sample pair.

The template image mask of the image sample of the central face template image sample can be obtained in various ways, for example, the template image mask of the image sample of the central face template image sample can be predicted by using a trained semantic segmentation network, or mask features can be extracted from the face template image, and the mask corresponding to the face template image sample is screened from a preset mask set according to the mask features to obtain the template image mask.

After the template image mask is obtained, the size of the template image mask may be adjusted in various ways, for example, in order to meet the requirement of changing the facial form of the face, the template image mask may be expanded outward by a predetermined number of pixels, for example, 15 pixels or another number of pixels.

After obtaining the adjusted template map mask, size differences between the adjusted template map mask and the initial sample face mask and between the adjusted template map mask and the target sample face mask may be respectively calculated, and the size differences are fused, so as to obtain segmentation loss information of the image sample pair, which may be specifically shown in formula (7):

L _seg ＝‖R(M _tar )-M _low ‖ ₁ +‖M _tar -M _r ‖ ₁ (7)

wherein L is _seg For dividing loss information, R (M) _tar ) For adjusting the mask of the rear template map, M _tar For masking the template pattern, M _low For an initial sample face mask, M _r Is a target sample face mask.

C5, determining face loss information of the image sample pair according to the image sample pair, the predicted face image and the initial predicted face image.

For example, the similarity of the image samples with respect to the predicted face image and the initial predicted face image may be calculated respectively to obtain similarity loss information of the image sample pair, the countermeasure loss information and the cycle loss information of the image sample pair may be determined from the face template image sample, the predicted face image, and the initial predicted face image of the image sample pair, and the similarity loss information, the countermeasure loss information, and the cycle loss information may be taken as the face loss information of the image sample pair.

For example, when the objects in the face image sample and the face template image sample are the same object, the spatial similarities between the face template image sample and the predicted face image and the initial predicted face image are respectively calculated to obtain the spatial loss information of the image sample pair, the image features of the face template image sample, the predicted face image and the initial predicted face image are extracted, the feature similarity between the image features is calculated to obtain the feature similarity loss information of the image sample pair, and the spatial similarity loss information and the feature similarity loss information are used as the similarity loss information of the image sample pair.

The spatial similarity may be understood as a similarity constraint of the face image sample, the predicted face image and the initial predicted face image in an image RGB (color channel) space, and there may be multiple ways of calculating the spatial similarity, for example, a first spatial similarity between the face image sample and the predicted face image and a second spatial similarity between the face image sample and the initial predicted face image may be calculated by using an L1 norm, and the first spatial similarity and the second spatial similarity are fused to obtain spatial similarity loss information of the image sample pair, which may be specifically represented by formula (8):

L _rec ＝||I _r -I _t || ₁ +||I _low -R(I _t )|| ₁ (8)

wherein L is _rec For loss of information for spatial similarity, I _r To predict facial images, I _t As a facial image sample, I _low For initial prediction of face images, R (I) _t ) The scaled face image has the same size (resolution) as the size (resolution) of the original predicted face image.

For example, the feature extraction network may be adopted to extract image features of a face template image sample, a predicted face image and an initial predicted face image, obtain image features output by each feature layer of the feature extraction network, calculate feature difference values of image features output by the face template image sample and the predicted face image in the same feature layer, and then fuse the feature difference values and feature sizes of the image features, thereby obtaining a first feature similarity, which may be specifically shown as formula (9):

wherein L is _perceptual Is the first feature similarity, C _i H _i W _i Size of image feature of i-th layer, F _i (I _r ) To predict the image characteristics of the face image at the i-th layer, F _i (I _t ) Mapping at i-th layer for face template image sampleLike the features.

And calculating the feature difference of the image features output by the face template image sample and the initial predicted face image in the same feature layer so as to obtain a second feature similarity, wherein the specific process can be referred to above, and is not repeated one by one. The first feature similarity and the second feature similarity are fused to obtain feature similarity loss information of the image sample pair, and the fusion mode may be multiple, for example, the first feature similarity and the second feature similarity may be directly added to obtain feature similarity loss information, or a weighting coefficient of the first feature similarity and the second feature similarity may be obtained, the first feature similarity and the second feature similarity are weighted according to the weighting coefficient, and the weighted first feature similarity and the weighted second feature similarity are fused to obtain feature similarity loss information.

For example, a countermeasure network may be adopted to respectively calculate first countermeasure parameters of the face template image and the predicted face image, and determine the first countermeasure loss information based on the first countermeasure parameters, which may be specifically shown in formula (10):

wherein L is _adv As the first countermeasure loss information, there is,

expected value of face template image, I _t Is a sample of the face template image,

for the expected value of the generated predicted face image, G (I) _t ,I _s ) To predict facial images, D (I) _t ) And D (G (I) _t ,I _s ) Is the first antagonizing parameter.

And respectively calculating second countermeasure parameters of the face template image and the initial predicted face image by adopting a countermeasure network, determining second countermeasure loss information based on the second countermeasure parameters, and determining the process. The first countermeasure loss information and the second countermeasure loss information are fused to obtain countermeasure loss information, the fusion process can be various, for example, the first countermeasure loss information and the second countermeasure loss information can be directly added to obtain the countermeasure loss information, or weighting parameters of the first countermeasure loss information and the second countermeasure loss information can be obtained, the first countermeasure loss information and the second countermeasure loss information are weighted according to the weighting parameters, and the weighted first countermeasure loss information and the weighted second countermeasure loss information are fused to obtain the countermeasure loss information.

For example, the norm of the face template image sample and the predicted face image obtained after image processing may be calculated by using an L1 norm, so as to obtain the period loss information of the image sample pair, which may be specifically represented by formula (11):

L _cyc ＝||I _t -G(I _r ,I _t )|| ₁ (11)

wherein L is _cyc For periodic loss of information, I _t For face template image samples, I _r To predict a face image, G is an image processing function.

And C6, fusing the shape loss information, the image loss information, the segmentation loss information and the face loss information, and converging the preset image processing model based on the fused loss information to obtain the trained image processing model.

For example, the first loss information is obtained by fusing the shape loss information and the image loss information, the second loss information is obtained by fusing the segmentation loss information and the face loss information, and the post-fusion loss information is obtained by fusing the first loss information and the second loss information. And converging the preset image processing model based on the fused loss information to obtain the trained image processing model.

For example, a first weighting parameter of the shape loss information and the image loss information may be obtained, the shape loss information and the image loss information are weighted according to the first weighting parameter, and the weighted shape loss information and the weighted image loss information are fused to obtain first loss information, which may be specifically shown in formula (12):

L _sid ＝λ _shape L _shape +λ _id L _id (12)

wherein L is _sid As first loss information, λ _shape And λ _id Is a first weighting parameter, L _shape As shape loss information, L _id Is image loss information. The first weighting parameter may be set according to practical applications, for example, the first weighting parameter of the shape loss information may be 5 or other arbitrary values, and the first weighting parameter of the image loss information may be 0.5 or other values.

For example, a second weighting parameter of the segmentation loss information and the face loss information may be obtained, the segmentation loss information and the face loss information are weighted according to the second weighting parameter, and the weighted segmentation loss information and the weighted face loss information are fused to obtain second loss information, which may be specifically as shown in formula (13):

L _real ＝L _adv +λ ₀ L _seg +λ ₁ L _rec +λ ₂ L _cyc +λ ₃ L _lpips (13)

wherein L is _real Is second loss information, L _adv To combat loss information, L _seg For dividing loss information, L _rec For loss of information for spatial similarity, L _cyc For periodic loss of information, L _kpips For loss of information of feature similarity, λ ₀ 、λ ₁ 、λ ₂ 、λ ₃ Respectively, the second weighting parameter, which can be set according to the actual application, for example, λ ₀ May be 100, λ ₁ Is 20, λ ₂ Is 1, λ ₃ The value is 5, but may be any other value.

After the first loss information and the second loss information are obtained, the first loss information and the second loss information may be fused to obtain fused loss information, and the fusion manner may also be multiple, for example, the first loss information and the second loss information may be directly added to obtain the fused loss information, which may be specifically shown in formula (14):

L＝L _sid +L _real (14)

wherein L is loss information after fusion, L _sid Is the first loss information, L _real Is the second loss information.

After the post-fusion loss information is obtained, the preset image processing model may be converged based on the post-fusion loss information to obtain the trained image processing model, and the convergence mode may be various, for example, a gradient descent algorithm may be used to update the network parameters of the preset image processing model according to the post-fusion loss information to converge the preset image processing model to obtain the trained image processing model, or other convergence algorithms may be used to update the network parameters of the preset image processing model according to the post-fusion loss information to converge the preset image processing model to obtain the trained image processing model.

The method is characterized in that a scheme based on generation of a confrontation network is adopted for the whole image processing model after training, two new modules are provided for a generator on the basis, and the face shape change and the sense of reality are enhanced by a three-dimensional method and semantic segmentation. As shown in fig. 7, the generator of the model mainly comprises four parts, namely an encoder, a decoder, a face-sensitive facial feature extractor and a facial semantic fusion module, wherein the encoder is used for extracting the attribute features of the facial template image, and the decoder is used for fusing the facial features and the attribute features. The face sensitive facial feature extractor is used for generating facial features, and the face semantic fusion module is used for generating a target facial image and completing a face changing process of replacing an object in the facial template image with a source object.

Taking a facial image as an example of a human face, the target face-changed image is obtained by processing the facial image and the facial template image according to the scheme, and the image effect of the target face-changed image can be directly compared with the image effect of the face-changed image obtained in the prior art (FaceSwap, faceshift, and SimSwap), and the comparison result can be as shown in fig. 8. The target after-face-change image and the image after face-change in the prior art are subjected to a quantitative facial identification experiment, the experimental data are shown in the table (1), and experiments show that the target after-face-change image obtained by the scheme can well keep the shape and the target attribute of a source face and has higher image quality.

Watch (1)

Method	ID↑	Posture ↓	Shape ↓
				Prior art 1	54.19	2.51	0.610
Prior art 2	97.38	2.96	0.511
				Prior art 3	92.83	1.53	0.540
This scheme	98.48	2.63	0.540

As can be seen from the above, in the embodiments of the present application, after the face image of the source face and the face template image of the template face are obtained, feature extraction is performed on the face image and the face template image to obtain the image texture feature of the source object and the attribute feature of the object in the face template image, then, according to the face image and the face template image, face modeling is performed on the source object and the object in the face template image to obtain the first three-dimensional modeling parameter of the source object and the second three-dimensional modeling parameter of the object in the face template image, the first three-dimensional modeling parameter and the second three-dimensional modeling parameter are fused to obtain the target three-dimensional modeling parameter, a three-dimensional face image is constructed according to the target three-dimensional modeling parameter to obtain the three-dimensional face feature of the three-dimensional face image, and then, the object in the face template image is replaced by the source object based on the image texture feature, the three-dimensional face feature and the attribute feature, obtaining a target face image; according to the scheme, three-dimensional modeling parameters can be identified in the face image and the face template image, the three-dimensional face image is constructed based on the three-dimensional modeling parameters, so that three-dimensional face features are obtained, the face features are geometrically constrained, the three-dimensional face features, image texture features and attribute features are fused, the result of the replaced face can be more real, and therefore the accuracy of face image processing can be improved.

The method described in the above examples is further illustrated in detail below by way of example.

In this embodiment, the image processing apparatus is specifically integrated in an electronic device, the electronic device is a server, and a face image is a human face image.

The server trains a preset image processing model to obtain a trained image processing model

(1) The server obtains a face image sample set, and at least one image sample pair is screened out from the face image sample set.

For example, a plurality of original face image samples may be obtained, the face in the original salient sample is aligned to a uniform position by using face key point registration, and the size of the aligned original face image sample is adjusted to 256 × 256, so as to obtain a face image sample set. Two image samples are arbitrarily screened out from the face image sample set, and any image is designated as the face image sample in the image samples, so that the other image sample can be the face template image sample, and an image sample pair is obtained.

(2) And the server adopts a preset image processing model to replace the object in the face template image sample with the object in the face image sample to obtain a predicted face image.

For example, the server may perform feature extraction on the face image sample and the face template image sample by using a preset image processing model to obtain image sample texture features of the face image sample and sample attribute features of an object in the face template image sample, respectively identifying three-dimensional modeling parameters of samples in the face image samples and the face template image samples, and fusing the identified three-dimensional modeling parameters of the sample to obtain three-dimensional detection parameters of the target sample, constructing a sample three-dimensional face image according to the target sample three-dimensional modeling parameters to obtain sample three-dimensional facial features of the sample three-dimensional face image, and fusing the three-dimensional facial features of the sample, the texture features of the image sample and the attribute features of the sample to obtain the facial features of the fused sample, and constructing a facial mask corresponding to the facial features of the fused sample to obtain an initial sample facial mask. And fusing the initial sample face mask, the fused sample face features and the sample attribute features to obtain target sample face features, adjusting the target sample face features, constructing a face mask corresponding to the adjusted sample face features to obtain a target sample face mask, and generating a predicted face image based on the target sample face mask and the adjusted sample face features.

(3) And the server converges the preset image processing model based on the image sample pair and the predicted face image to obtain the trained image processing model.

For example, the server may generate an initial predicted face image based on the target sample facial features and an initial sample face mask, determine shape loss information of an image sample pair according to the sample three-dimensional face image, the initial predicted face image and the predicted face image, determine segmentation loss information of the image sample pair based on the initial sample face mask and the target sample face mask, determine face loss information of the image sample pair according to the image sample pair, the predicted face image and the initial predicted face image, fuse the shape loss information, the segmentation loss information and the face loss information, and converge a preset image processing model based on the fused loss information to obtain a trained image processing model, which may specifically be as follows:

d1, the server generates an initial predicted face image based on the target sample facial features and the initial sample face mask.

For example, the server may generate an initial face sample image according to the target sample facial features, screen out images in the initial sample face mask from the candidate face images to obtain a basic face sample image, screen out images other than the initial sample face mask from the face template image sample to obtain a background sample image, and fuse the background sample image and the basic face sample image to obtain an initial predicted face image.

D2, the server determines the shape loss information of the image sample pair according to the sample three-dimensional face image, the initial prediction face image and the prediction face image.

For example, the server may obtain, by using the three-dimensional renderer pytorch3D, a 2D projection of the sample three-dimensional face image under a preset angle coefficient to obtain first projection information, and obtain, in the 2D projection, positions of 18 contour points of the face contour to obtain first position information of the face contour.

The server adopts a face three-dimensional reconstruction model to reconstruct an initial prediction face image and a target three-dimensional face image corresponding to the prediction face image, second projection information of the target three-dimensional face image is obtained, second position information of a face contour in the initial prediction face image is extracted from the second projection information, and third position information of the face contour in the prediction face image is extracted from the third projection information.

The server calculates a position difference between contour points of the same face contour in the first position information and the second position information, calculates a position difference between contour points of the same face contour in the first position information and the third position information, and then calculates an average value of the position differences, thereby obtaining shape loss information of the image sample pair, which may be specifically shown in formula (4).

D3, the server calculates the face similarity of the face image sample in the image sample pair with the predicted face image and the initial predicted face image respectively, to obtain the image loss information of the image sample pair.

For example, the server may extract facial features in the facial image sample, the predicted facial image, and the initial predicted facial image using a pre-trained facial recognition feature extractor, calculate a cosine similarity between the facial image sample and the facial features of the predicted facial image, thereby obtaining a first facial similarity between the facial image sample and the predicted facial image, calculate a second facial similarity between the facial image sample and the initial predicted facial image, and directly add the first facial similarity and the second facial similarity, thereby obtaining image loss information, which may be specifically represented by equation (6).

D4, the server determines segmentation loss information for the image sample pairs based on the initial sample face mask and the target sample face mask.

For example, the server may predict the template image mask of the face template image sample in the trained semantic segmentation network image sample pair, or may extract mask features from the face template image sample, and screen out the mask corresponding to the face template image sample in the preset mask set according to the mask features, so as to obtain the template image mask. And expanding the template picture mask outwards by 15 pixels to obtain the adjusted template picture mask. And (3) respectively calculating the size difference values of the adjusted template image mask, the initial sample face mask and the target sample face mask, and fusing the size difference values to obtain the segmentation loss information of the image sample pair, wherein the segmentation loss information can be specifically shown in a formula (7).

D5, the server determines the face loss information of the image sample pair according to the image sample pair, the predicted face image and the initial predicted face image.

For example, when the object in the face image sample and the face template image sample is the same object, the server may calculate a first spatial similarity between the face image sample and the predicted face image and a second spatial similarity between the face image sample and the initial predicted face image by using the L1 norm, and fuse the first spatial similarity and the second spatial similarity to obtain spatial similarity loss information of the image sample pair, which may be specifically shown in equation (8).

The server can also adopt a feature extraction network to extract image features of a face template image sample, a predicted face image and an initial predicted face image to obtain image features output by each feature layer of the feature extraction network, calculate a feature difference value of the image features output by the face template image sample and the predicted face image on the same feature layer, then fuse the feature difference value with the feature size of the image features to obtain a first feature similarity, specifically as shown in formula (9), calculate a feature difference value of the image features output by the face template image sample and the initial predicted face image on the same feature layer to obtain a second feature similarity, directly add the first feature similarity and the second feature similarity to obtain feature similarity loss information, or further obtain a weighting coefficient of the first feature similarity and the second feature similarity, and weighting the first feature similarity and the second feature similarity according to the weighting coefficient, and fusing the weighted first feature similarity and the weighted second feature similarity to obtain feature similarity loss information. And fusing the face similarity loss information, the spatial similarity loss information and the feature similarity loss information to obtain the similarity loss information of the image sample pair.

The server may respectively calculate first countermeasure parameters of the face template image and the predicted face image by using a countermeasure network, determine first countermeasure information based on the first countermeasure parameters, specifically as shown in formula (10), respectively calculate second countermeasure parameters of the face template image sample and the initial predicted face image by using the countermeasure network, and determine second countermeasure information based on the second countermeasure parameters. And adding the first countermeasure loss information and the second countermeasure loss information to obtain countermeasure loss information, or obtaining weighting parameters of the first countermeasure loss information and the second countermeasure loss information, weighting the first countermeasure loss information and the second countermeasure loss information according to the weighting parameters, and fusing the weighted first countermeasure loss information and the weighted second countermeasure loss information to obtain the countermeasure loss information.

The server may further calculate a norm of the face template image sample and the predicted face image obtained after the image processing by using the L1 norm, so as to obtain the period loss information of the image sample pair, which may be specifically shown in equation (11).

After obtaining the spatial similarity loss information, the feature similarity loss information, the confrontation loss information, and the period loss information, the spatial similarity loss information, the feature similarity loss information, the confrontation loss information, and the period loss information may be used as face loss information of the image sample pair.

D6, fusing the shape loss information, the image loss information, the segmentation loss information and the face loss information by the server, and converging the preset image processing model based on the fused loss information to obtain the trained image processing model.

For example, the server obtains a first weighting parameter of the shape loss information and the image loss information, weights the shape loss information and the image loss information according to the first weighting parameter, and fuses the weighted shape loss information and the weighted image loss information to obtain first loss information, which may be specifically represented by formula (12).

The server obtains a second weighting parameter of the segmentation loss information and the face loss information, weights the segmentation loss information and the face loss information according to the second weighting parameter, and fuses the weighted segmentation loss information and the face loss information to obtain second loss information, which may be specifically represented by formula (13).

The first loss information and the second loss information are added to obtain the fused loss information, which may be specifically represented by formula (14).

After the server obtains the post-fusion loss information, the server may update the network parameters of the preset image processing model according to the post-fusion loss information by using a gradient descent algorithm to converge the preset image processing model to obtain the post-training image processing model, or may update the network parameters of the preset image processing model according to the post-fusion loss information by using another convergence algorithm to converge the preset image processing model to obtain the post-training image processing model.

As shown in fig. 9, a specific flow of an image processing method is as follows:

201. the server obtains a face image of the source face and a face template image of the template face.

For example, the server may directly receive an original face image uploaded by a user and image processing information corresponding to the original face image, and screen a face image of a source face and a face template image of a template face from the original face image according to the image processing information, or acquire a face image pair from an image database or a network, screen any one face image from the face image pair as the face image of the source face, and use the other face image from the face image pair as the face template image of the template face.

When the number of the face images and the face template images is large or the memory is large, the server can receive an image processing request sent by the terminal, the image processing request carries a storage address and image processing information of an original face image, the original face image is obtained in the memory, a cache or a third-party database according to the storage address, the size of the original face image is adjusted to be a preset size, or a face key point registration can be used for aligning a face object in the original face image to a uniform position to obtain a processed face image, and the face image of a source face and the face template image of a template face are screened out from the processed face image according to the image processing information.

202. The server extracts the characteristics of the face image and the face template image to obtain the image texture characteristics of the source object and the attribute characteristics of the object in the face template image.

For example, the server may perform feature coding on the face template image by using an encoder network (formed by stacking 8 Res-blocks) of the trained image processing model to obtain attribute features of the object in the face template image, and perform feature extraction on the face image by using a face recognition network of the trained image processing model to obtain image texture features of the source object.

203. The server carries out face modeling on the source object and the object in the face template image according to the face image and the face template image so as to obtain a first three-dimensional modeling parameter of the source object and a second three-dimensional modeling parameter of the object in the face template image, and the first three-dimensional modeling parameter and the second three-dimensional modeling parameter are fused to obtain a target three-dimensional modeling parameter.

For example, the server performs regression on the face image and the face template image using the three-dimensional face reconstruction model, thereby performing face modeling on the source object and the object in the face template image, directly extracting a first three-dimensional modeling parameter from the face model of the source object, and extracting a second three-dimensional modeling parameter from the face model of the object in the face template image, or may convert the face model into the three-dimensional modeling parameters, thereby obtaining the first three-dimensional modeling parameter and the second three-dimensional modeling parameter.

The server extracts a face shape parameter corresponding to the face image from the first three-dimensional modeling parameters, extracts a face action parameter corresponding to the face template image from the second three-dimensional modeling parameters, and directly splices and combines the face shape parameter and the face action parameter, or obtains a weighting parameter and a basic modeling parameter of the face shape parameter and the face action parameter, weights the face shape parameter and the face action parameter according to the weighting parameter, and fuses the weighted face shape parameter and the weighted face action parameter with the basic modeling parameter, so as to obtain a target three-dimensional modeling parameter, which can be specifically shown in formula (1).

204. And the server constructs a three-dimensional face image according to the target three-dimensional modeling parameters to obtain the three-dimensional facial features of the three-dimensional face image.

For example, the server may construct a three-dimensional face model corresponding to the target three-dimensional modeling parameter through the three-dimensional face reconstruction model to obtain a three-dimensional face image, obtain three-dimensional features of the three-dimensional face model, and use the three-dimensional features as the three-dimensional face features of the three-dimensional face image, or may construct a three-dimensional face model corresponding to the target three-dimensional modeling parameter through the three-dimensional face reconstruction model, perform local fitting or optimization on the three-dimensional face model, and the like to obtain a three-dimensional face image, obtain three-dimensional features of the adjusted three-dimensional face model, and use the three-dimensional features as the three-dimensional face features of the three-dimensional face image.

205. And the server splices the image texture features and the three-dimensional facial features to obtain the facial features.

For example, the server may directly stitch the image texture features and the three-dimensional facial features to obtain the facial features, or may obtain weighting parameters of the image texture features and the three-dimensional facial features, weight the image texture features and the three-dimensional facial features according to the weighting parameters, and fuse the weighted image texture features and the three-dimensional facial features to obtain the facial features, or may further obtain feature depths of the image texture features and the three-dimensional facial features, adjust the feature depths to uniform target feature depths, and stitch the image texture features and the three-dimensional facial features based on the target feature depths to obtain the facial features.

206. And the server adopts the trained image processing model to fuse the facial features and the attribute features to obtain fused rear facial features.

For example, the server may decode the facial features and the attribute features by using a decoder network (formed by stacking 5 Res-blocks including AdaIN) of the trained image processing model, and fuse the decoded facial features and attribute features to obtain fused posterior features.

207. The server generates a target face image based on the fused posterior features.

For example, the server may construct a face mask corresponding to the fused back face features to obtain an initial face mask, fuse the initial face mask, the fused back face features, and the attribute features to obtain target face features, adjust the target face features, construct a face mask corresponding to the adjusted back face features to obtain the target face mask, and generate a target face image based on the target face mask and the adjusted back face features, which may specifically be as follows:

(1) and the server constructs and fuses the face mask corresponding to the rear face features to obtain an initial face mask.

For example, the server may perform feature extraction on the fused back surface features by using a semantic segmentation network, identify a segmentation region in the face image or the face template image based on the extracted semantic features, segment the face image or the face template image based on the segmentation region, and perform occlusion processing on the segmented face image, thereby obtaining an initial face mask, or may perform feature extraction on the fused back surface features by using the semantic segmentation network, and directly generate the initial face mask based on the extracted semantic features.

(2) And the server fuses the initial face mask, the fused back face features and the attribute features to obtain the target face features.

For example, the server may perform feature transformation on the attribute features by using one or more Res-blocks to obtain target attribute features of objects in the face template image. And screening out regions except the initial face mask in the face image or the face template image to obtain a target image region, and determining weighting parameters according to the initial face mask and the target image region. And (3) directly adding the weighted features and the weighted attribute features to obtain the target facial features, which can be specifically shown in formula (2).

(3) And the server adjusts the target facial features and constructs a facial mask corresponding to the adjusted rear facial features to obtain the target facial mask.

For example, the server may employ an upsampling structure composed of one or more Res-blocks to enlarge the size of the target facial feature to a preset size, so as to obtain an adjusted rear facial feature. And adopting a semantic segmentation network to extract the features of the adjusted facial features, identifying segmentation areas in the face image or the face template image based on the extracted semantic features, segmenting the face image or the face template image based on the segmentation areas, and shielding the segmented face image so as to obtain the target facial mask, or adopting the semantic segmentation network to extract the features of the adjusted facial features and directly generating the target facial mask based on the extracted semantic features.

(4) The server generates a target face image based on the target face mask and the adjusted back facial features.

For example, the server may generate an initial face image according to the adjusted back facial features, screen out an image in the target face mask from the initial face image to obtain a base face image, and recognize an image other than the target face mask from the face template image to obtain a background image. The basic face image and the background image are spliced to obtain the target face image, or the sizes of the basic face image and the background image are adjusted, and the size-adjusted basic face image and the size-adjusted background image are superposed to obtain the target face image, which may be specifically represented by formula (3).

As can be seen from the above, in the embodiments of the present application, after obtaining a face image of a source face and a face template image of a template face, feature extraction is performed on the face image and the face template image to obtain an image texture feature of a source object and an attribute feature of an object in the face template image, then, according to the face image and the face template image, face modeling is performed on the source object and the object in the face template image to obtain a first three-dimensional modeling parameter of the source object and a second three-dimensional modeling parameter of the object in the face template image, the first three-dimensional modeling parameter and the second three-dimensional modeling parameter are fused to obtain a target three-dimensional modeling parameter, a three-dimensional face image is constructed according to the target three-dimensional modeling parameter to obtain a three-dimensional face feature of the three-dimensional face image, and then, the object in the face template image is replaced with the source object based on the image texture feature, the three-dimensional face feature and the attribute feature, obtaining a target face image; according to the scheme, three-dimensional modeling parameters can be identified in the face image and the face template image, the three-dimensional face image is constructed based on the three-dimensional modeling parameters, so that three-dimensional face features are obtained, geometric constraints are obtained for the face features, and the three-dimensional face features, the image texture features and the attribute features are fused, so that the result of the replaced face is more real, and therefore the accuracy of face image processing can be improved.

In order to better implement the above method, an embodiment of the present invention further provides an image processing apparatus, which may be integrated in an electronic device, such as a server or a terminal, and the terminal may include a tablet computer, a notebook computer, and/or a personal computer.

For example, as shown in fig. 10, the image processing apparatus may include an acquisition unit 301, an extraction unit 302, a fusion unit 303, a construction unit 304, and a replacement unit 305 as follows:

(1) an acquisition unit 301;

an acquisition unit 301 for acquiring a face image of a source face and a face template image of a template face, the face image including a source object.

For example, the acquiring unit 301 may be specifically configured to directly acquire the face image and the face template image, or indirectly acquire the face image and the face template image when the number of the face images and the face template images is large or the memory is large.

(2) An extraction unit 302;

an extracting unit 302, configured to perform feature extraction on the face image and the face template image to obtain an image texture feature of the source object and an attribute feature of the object in the face template image.

For example, the extracting unit 302 may be specifically configured to perform feature coding on a face template image by using an encoder network of a trained image processing model to obtain attribute features of an object in the face template image, and perform feature extraction on the face image by using a face recognition network of the trained image processing model to obtain image texture features of a source object.

(3) A fusion unit 303;

the fusion unit 303 is configured to perform face modeling on the source object and the object in the face template image according to the face image and the face template image to obtain a first three-dimensional modeling parameter of the source object and a second three-dimensional modeling parameter of the object in the face template image, and fuse the first three-dimensional modeling parameter and the second three-dimensional modeling parameter to obtain a target three-dimensional modeling parameter.

For example, the fusing unit 303 may be specifically configured to perform regression on a face image and a face template image by using a three-dimensional face reconstruction model, so as to perform face modeling on a source object and an object in the face template image, acquire a first three-dimensional modeling parameter of the source object and a second three-dimensional modeling parameter of the object in the face template image from the constructed face model, extract a face shape parameter corresponding to the face image from the first three-dimensional modeling parameter, extract a face action parameter corresponding to the face template image from the second three-dimensional modeling parameter, and fuse the face shape parameter and the face action parameter to obtain a target three-dimensional modeling parameter.

(4) A building unit 304;

and the constructing unit 304 is used for constructing a three-dimensional face image according to the target three-dimensional modeling parameters to obtain three-dimensional face features of the three-dimensional face image.

For example, the constructing unit 304 may be specifically configured to construct a three-dimensional face model corresponding to a target three-dimensional modeling parameter through a three-dimensional face reconstruction model, obtain a three-dimensional face image, obtain three-dimensional features of the three-dimensional face model, and use the three-dimensional features as the three-dimensional face features of the three-dimensional face image, or may construct a three-dimensional face model corresponding to the target three-dimensional modeling parameter through the three-dimensional face reconstruction model, adjust the three-dimensional face model, for example, perform local fitting or optimization, and obtain a three-dimensional face image, obtain three-dimensional features of the adjusted three-dimensional face model, and use the three-dimensional features as the three-dimensional face features of the three-dimensional face image.

(5) A replacement unit 305;

a replacing unit 305, configured to replace the object in the face template image with the source object based on the image texture feature, the three-dimensional face feature and the attribute feature, so as to obtain a target face image.

For example, the replacing unit 305 may be specifically configured to splice image texture features and three-dimensional facial features to obtain facial features, fuse the facial features and the attribute features by using a trained image processing model to obtain fused posterior features, and generate a target facial image based on the fused posterior features, where the target facial image is an image obtained by replacing an object in the facial template image with a source object.

Optionally, the image processing apparatus may further include a training unit 306, as shown in fig. 11, which may specifically be as follows:

the training unit 306 may be specifically configured to train a preset image processing model to obtain a trained image processing model.

For example, the training unit 306 may be specifically configured to obtain a face image sample set, screen at least one image sample pair from the face image sample set, where the image sample pair includes a face image sample and a face template image sample, replace an object in the face template image sample with an object in the face image sample by using a preset image processing model to obtain a predicted face image, and converge the preset image processing model based on the image sample pair and the predicted face image to obtain a trained image processing model.

In a specific implementation, the above units may be implemented as independent entities, or may be combined arbitrarily to be implemented as the same or several entities, and the specific implementation of the above units may refer to the foregoing method embodiments, which are not described herein again.

As can be seen from the above, in this embodiment, after the obtaining unit 301 obtains the face image of the source face and the face template image of the template face, the extracting unit 302 performs feature extraction on the face image and the face template image to obtain the image texture feature of the source object and the attribute feature of the object in the face template image, then the fusing unit 303 performs face modeling on the source object and the object in the face template image according to the face image and the face template image to obtain the first three-dimensional modeling parameter of the source object and the second three-dimensional modeling parameter of the object in the face template image, and fuses the first three-dimensional modeling parameter and the second three-dimensional modeling parameter to obtain the target three-dimensional modeling parameter, the constructing unit 304 constructs a three-dimensional face image according to the target three-dimensional modeling parameter to obtain the three-dimensional face feature of the three-dimensional face image, and then the replacing unit 305 performs face image extraction based on the image texture feature, Replacing the object in the face template image with the source object by using the three-dimensional face characteristic and the attribute characteristic to obtain a target face image; according to the scheme, three-dimensional modeling parameters can be identified in the face image and the face template image, the three-dimensional face image is constructed based on the three-dimensional modeling parameters, so that three-dimensional face features are obtained, the face features are geometrically constrained, the three-dimensional face features, image texture features and attribute features are fused, the result of the replaced face can be more real, and therefore the accuracy of face image processing can be improved.

An embodiment of the present invention further provides an electronic device, as shown in fig. 12, which shows a schematic structural diagram of the electronic device according to the embodiment of the present invention, specifically:

the electronic device may include components such as a processor 401 of one or more processing cores, memory 402 of one or more computer-readable storage media, a power supply 403, and an input unit 404. Those skilled in the art will appreciate that the electronic device configuration shown in fig. 12 is not limiting of electronic devices and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components. Wherein:

the processor 401 is a control center of the electronic device, connects various parts of the whole electronic device by various interfaces and lines, performs various functions of the electronic device and processes data by running or executing software programs and/or modules stored in the memory 402 and calling data stored in the memory 402, thereby performing overall monitoring of the electronic device. Optionally, processor 401 may include one or more processing cores; preferably, the processor 401 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 401.

The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by operating the software programs and modules stored in the memory 402. The memory 402 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the electronic device, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 access to the memory 402.

The electronic device further comprises a power supply 403 for supplying power to the various components, and preferably, the power supply 403 is logically connected to the processor 401 through a power management system, so that functions of managing charging, discharging, and power consumption are realized through the power management system. The power supply 403 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

The electronic device may further include an input unit 404, and the input unit 404 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

Although not shown, the electronic device may further include a display unit and the like, which are not described in detail herein. Specifically, in this embodiment, the processor 401 in the electronic device loads the executable file corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 401 runs the application program stored in the memory 402, thereby implementing various functions as follows:

the method comprises the steps of obtaining a face image of a source face and a face template image of a template face, wherein the face image comprises a source object, carrying out feature extraction on the face image and the face template image to obtain image texture features of the source object and attribute features of objects in the face template image, carrying out face modeling on the source object and the objects in the face template image according to the face image and the face template image to obtain first three-dimensional modeling parameters of the source object and second three-dimensional modeling parameters of the objects in the face template image, and fusing the first three-dimensional modeling parameters and the second three-dimensional modeling parameters to obtain target three-dimensional modeling parameters; constructing a three-dimensional face image according to the target three-dimensional modeling parameters to obtain three-dimensional face features of the three-dimensional face image; and replacing the object in the face template image with the source object based on the image texture feature, the three-dimensional face feature and the attribute feature to obtain a target face image.

For example, the electronic device acquires the face image and the face template image, or indirectly acquires the face image and the face template image when the number of the face images and the face template images is large or the memory is large. And performing feature coding on the face template image by adopting an encoder network of the trained image processing model to obtain attribute features of the object in the face template image, and performing feature extraction on the face image by adopting a face identification network of the trained image processing model to obtain image texture features of the source object. The method comprises the steps of adopting a three-dimensional face reconstruction model to carry out regression on a face image and a face template image so as to carry out face modeling on a source object and an object in the face template image, obtaining a first three-dimensional modeling parameter of the source object and a second three-dimensional modeling parameter of the object in the face template image from the built face model, extracting a face shape parameter corresponding to the face image from the first three-dimensional modeling parameter, extracting a face action parameter corresponding to the face template image from the second three-dimensional modeling parameter, and fusing the face shape parameter and the face action parameter to obtain a target three-dimensional modeling parameter. The three-dimensional face image is obtained by constructing a three-dimensional face model corresponding to the target three-dimensional modeling parameters through the three-dimensional face reconstruction model, the three-dimensional features of the three-dimensional face model are obtained, the three-dimensional features are used as the three-dimensional face features of the three-dimensional face image, or the three-dimensional face model corresponding to the target three-dimensional modeling parameters is constructed through the three-dimensional face reconstruction model, the three-dimensional face model is adjusted, for example, local fitting or optimization can be carried out, the three-dimensional face image is obtained, the three-dimensional features of the adjusted three-dimensional face model are obtained, and the three-dimensional features are used as the three-dimensional face features of the three-dimensional face image. The image texture features and the three-dimensional face features are spliced to obtain face features, the face features and the attribute features are fused by adopting a trained image processing model to obtain fused back face features, and a target face image is generated based on the fused back face features, wherein the target face image is an image obtained by replacing an object in a face template image with a source object.

The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.

It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.

To this end, the embodiment of the present invention provides a computer-readable storage medium, in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute the steps in any image processing method provided by the embodiment of the present invention. For example, the instructions may perform the steps of:

For example, the electronic device acquires the face image and the face template image, or indirectly acquires the face image and the face template image when the number of the face images and the face template images is large or the memory is large. And performing feature coding on the face template image by adopting an encoder network of the trained image processing model to obtain attribute features of the object in the face template image, and performing feature extraction on the face image by adopting a face identification network of the trained image processing model to obtain image texture features of the source object. The method comprises the steps of adopting a three-dimensional face reconstruction model to carry out regression on a face image and a face template image so as to carry out face modeling on a source object and an object in the face template image, obtaining a first three-dimensional modeling parameter of the source object and a second three-dimensional modeling parameter of the object in the face template image from the built face model, extracting a face shape parameter corresponding to the face image from the first three-dimensional modeling parameter, extracting a face action parameter corresponding to the face template image from the second three-dimensional modeling parameter, and fusing the face shape parameter and the face action parameter to obtain a target three-dimensional modeling parameter. The three-dimensional face image is obtained by constructing a three-dimensional face model corresponding to the target three-dimensional modeling parameters through the three-dimensional face reconstruction model, the three-dimensional features of the three-dimensional face model are obtained, the three-dimensional features are used as the three-dimensional face features of the three-dimensional face image, or the three-dimensional face model corresponding to the target three-dimensional modeling parameters is constructed through the three-dimensional face reconstruction model, the three-dimensional face model is adjusted, for example, local fitting or optimization can be carried out, the three-dimensional face image is obtained, the three-dimensional features of the adjusted three-dimensional face model are obtained, and the three-dimensional features are used as the three-dimensional face features of the three-dimensional face image. The image texture features and the three-dimensional face features are spliced to obtain face features, the face features and the attribute features are fused by adopting a trained image processing model to obtain fused rear face features, a target face image is generated based on the fused rear face features, and the target face image is an image obtained by replacing an object in the face template image with a source object.

Wherein the computer-readable storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

Since the instructions stored in the computer-readable storage medium can execute the steps in any image processing method provided in the embodiment of the present invention, the beneficial effects that can be achieved by any image processing method provided in the embodiment of the present invention can be achieved, for which details are shown in the foregoing embodiment and are not described herein again.

According to one aspect of the application, there is provided, among other things, a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the methods provided in the various alternative implementations of the image processing aspect or the image face changing aspect described above.

The image processing method, the image processing apparatus, and the computer-readable storage medium according to the embodiments of the present invention are described in detail, and the principles and embodiments of the present invention are described herein by applying specific examples, and the descriptions of the above embodiments are only used to help understanding the method and the core ideas of the present invention; meanwhile, for those skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed, and in summary, the content of the present specification should not be construed as limiting the present invention.

Claims

1. A face image processing method characterized by comprising:

extracting the features of the face image and the face template image to obtain image texture features of the source object and attribute features of the object in the face template image;

2. The facial image processing method according to claim 1, wherein said fusing the identified three-dimensional modeling parameters to obtain target three-dimensional modeling parameters comprises:

extracting a face shape parameter corresponding to the face image from the first three-dimensional modeling parameter;

extracting facial action parameters corresponding to the facial template image from the second three-dimensional modeling parameters;

and fusing the facial shape parameters and the facial action parameters to obtain target three-dimensional modeling parameters.

3. The facial image processing method according to claim 1, wherein said replacing an object in the face template image with the source object based on the image texture feature, the three-dimensional facial feature, and the attribute feature, resulting in a target facial image, comprises:

splicing the image texture features and the three-dimensional facial features to obtain facial features;

fusing the facial features and the attribute features by adopting a trained image processing model to obtain fused rear facial features;

generating a target face image based on the fused back face features, the target face image being an image in which an object in the face template image is replaced with the source object.

4. The facial image processing method according to claim 3, wherein said generating a target facial image based on the fused posterior feature comprises:

constructing a face mask corresponding to the fused rear face features to obtain an initial face mask;

fusing the initial face mask, the fused back face features and the attribute features to obtain target face features;

adjusting the target facial features, and constructing a facial mask corresponding to the adjusted rear facial features to obtain a target facial mask;

generating a target face image based on the target face mask and the adjusted back features.

5. The facial image processing method according to claim 4, wherein said fusing the initial face mask, the fused back facial features and the attribute features to obtain target facial features comprises:

performing feature conversion on the attribute features to obtain target attribute features of the object in the face template image;

determining weighting parameters of the fused back facial features and the target attribute features according to the initial face mask;

and weighting the fused posterior features and the target attribute features according to the weighting parameters, and fusing the weighted posterior features and the weighted attribute features to obtain the target facial features.

6. The facial image processing method of claim 4, wherein generating the target facial image based on the target facial mask and the adjusted rear facial features comprises:

generating an initial face image according to the adjusted rear face features, and screening out an image in the target face mask from the initial face image to obtain a basic face image;

identifying images except the target face mask in the face template image to obtain a background image;

and fusing the basic face image and the background image to obtain a target face image.

7. The method of claim 3, wherein before the fusing the facial features and the attribute features using the trained image processing model to obtain fused facial features, the method further comprises:

acquiring a facial image sample set, and screening at least one image sample pair from the facial image sample set, wherein the image sample pair comprises a facial image sample and a facial template image sample;

replacing the object in the face template image sample with the object in the face image sample by adopting a preset image processing model to obtain a predicted face image;

and converging the preset image processing model based on the image sample pair and the predicted face image to obtain a trained image processing model.

8. The method according to claim 7, wherein the replacing the object in the face template image sample with the object in the face image sample using a preset image processing model to obtain a predicted face image comprises:

performing feature extraction on the face image sample and the face template image by adopting a preset image processing model to obtain image sample texture features of the face image sample and sample attribute features of an object in the face template image sample;

respectively identifying three-dimensional modeling parameters of the samples from the face image samples and the face template image samples, and fusing the identified three-dimensional modeling parameters of the samples to obtain three-dimensional modeling parameters of the target samples;

constructing a sample three-dimensional face image according to the target sample three-dimensional modeling parameters to obtain sample three-dimensional face features of the sample three-dimensional face image, and fusing the sample three-dimensional face features, image sample texture features and sample attribute features to obtain fused sample face features;

and constructing a face mask corresponding to the fused sample face features to obtain an initial sample face mask, and generating a predicted face image based on the initial sample face mask, the fused sample face features and the sample attribute features.

9. The method of facial image processing according to claim 8, wherein said generating a predicted image based on said initial sample face mask, fused sample facial features and sample attribute features comprises:

fusing the initial sample face mask, the fused sample face features and the sample attribute features to obtain target sample face features;

adjusting the target sample facial features, and constructing a facial mask corresponding to the adjusted sample facial features to obtain a target sample facial mask;

generating a predicted face image based on the target sample face mask and the adjusted sample facial features.

10. The method of claim 9, wherein said converging the pre-defined image processing model based on the image sample pair and predicted face image to obtain a trained image processing model comprises:

generating an initial predicted face image based on the target sample facial features and an initial sample face mask;

determining shape loss information of the image sample pair according to the sample three-dimensional face image, the initial predicted face image and the predicted face image;

respectively calculating the face similarity of the face image sample in the image sample pair with the predicted face image and the initial predicted face image to obtain image loss information of the image sample pair;

determining segmentation loss information for the image sample pair based on the initial sample face mask and a target sample face mask;

determining face loss information for the image sample pair from the image sample pair, a predicted face image, and an initial predicted face image;

and fusing the shape loss information, the image loss information, the segmentation loss information and the face loss information, and converging a preset image processing model based on the fused loss information to obtain a trained image processing model.

11. The method of claim 10, wherein said determining shape loss information for said image sample pair from said sample three-dimensional face image, an initial predicted face image, and a predicted face image comprises:

acquiring first projection information of the sample three-dimensional face image, and extracting first position information of a face contour from the first projection information;

constructing the initial prediction face image and a target three-dimensional face image corresponding to the prediction face image, and acquiring second projection information of the target three-dimensional face image;

and extracting second position information of the face contour in the initial predicted face image and third position information of the face contour in the predicted face image from the second projection information, and respectively calculating the distance between the face contours according to the first position information, the second position information and the third position information so as to obtain the shape loss information of the image sample pair.

12. The method of facial image processing according to claim 10, wherein said determining segmentation loss information for the image sample pair based on the initial sample face mask and target sample face mask comprises:

obtaining a template picture mask of the image sample of the face template in the image sample pair;

adjusting the size of the template picture mask to obtain an adjusted template picture mask;

and respectively calculating the size difference values of the adjusted template image mask, the initial sample face mask and the target sample face mask, and fusing the size difference values to obtain the segmentation loss information of the image sample pair.

13. The method of claim 10, wherein determining face loss information for the pair of image samples from the pair of image samples, a predicted face image, and an initial predicted face image comprises:

respectively calculating the similarity of the image sample pair with the predicted face image and the initial predicted face image to obtain similarity loss information of the image sample pair;

determining confrontation loss information and cycle loss information of the image sample pair according to a face template image sample, a predicted face image and an initial predicted face image in the image sample pair;

and using the similarity loss information, the confrontation loss information and the period loss information as face loss information of the image sample pair.

14. The method according to claim 13, wherein said separately calculating the similarity of the pair of image samples to the predicted face image and an initial predicted face image to obtain the similarity loss information of the pair of image samples comprises:

when the face image sample and the object in the face template image sample are the same object, respectively calculating the spatial similarity of the face template image sample, the predicted face image and the initial predicted face image to obtain the spatial similarity loss information of the image sample pair;

extracting image features of the face template image sample, the predicted face image and the initial predicted face image, and calculating feature similarity between the image features to obtain feature similarity loss information of the image sample pair;

and taking the spatial similarity loss information and the characteristic similarity loss information as similarity loss information of the image sample pair.

15. A facial image processing apparatus characterized by comprising:

16. A computer-readable storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps of the facial image processing method according to any one of claims 1 to 14.