CN116630138A

CN116630138A - Image processing method, apparatus, electronic device, and computer-readable storage medium

Info

Publication number: CN116630138A
Application number: CN202210121870.9A
Authority: CN
Inventors: 王欣睿; 李琛
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-02-09
Filing date: 2022-02-09
Publication date: 2023-08-22

Abstract

The embodiment of the invention discloses an image processing method, an image processing device, electronic equipment and a computer readable storage medium; after at least one facial image sample pair is obtained, carrying out feature extraction on a facial image sample by adopting a preset image processing model to obtain sample facial features, carrying out facial replacement on the facial template image sample according to the sample facial features to obtain a replaced facial image, then reconstructing the facial image sample based on the sample facial features to obtain a reconstructed facial image, then converging the preset image processing model according to the facial image sample, the replaced facial image and the reconstructed facial image to obtain a trained image processing model, and carrying out facial replacement on the facial image to be processed by adopting the trained image processing model; the scheme can improve the accuracy of image processing, and the embodiment of the invention can be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic, auxiliary driving and the like.

Description

Image processing method, apparatus, electronic device, and computer-readable storage medium

Technical Field

The present invention relates to the field of communications technologies, and in particular, to an image processing method, an image processing apparatus, an electronic device, and a computer readable storage medium.

Background

In recent years, with the development of technology, in applications such as film special effects and internet social contact, there is a need to replace the face of an object with the face of another object while maintaining the style of the object in the face image, and in response to this need, it is necessary to process the face image, and the conventional image processing method mainly uses a many-to-many image processing model to perform the face replacement process.

In the research and practice process of the prior art, the inventor finds that when training the many-to-many image processing model, a large number of unlabeled face images are usually adopted for training, and because of lack of supervision information, the image processing model is unstable in training and is easy to generate bad case, so that the accuracy of image processing is lower.

Disclosure of Invention

The embodiment of the invention provides an image processing method, an image processing device, electronic equipment and a computer readable storage medium, which can improve the accuracy of image processing.

An image processing method, comprising:

Acquiring at least one facial image sample pair, the facial image sample pair comprising a facial image sample and a facial template image sample;

extracting features of the facial image sample by adopting a preset image processing model to obtain sample facial features;

performing face replacement on the face template image sample according to the sample face characteristics to obtain a replaced face image;

reconstructing the facial image sample based on the sample facial features to obtain a reconstructed facial image;

and converging the preset image processing model according to the face image sample, the replaced face image and the reconstructed face image to obtain a trained image processing model, and adopting the trained image processing model to replace the face of the face image to be processed.

Accordingly, an embodiment of the present invention provides an image processing apparatus, including:

an acquisition unit configured to acquire at least one pair of face image samples including a face image sample and a face template image sample;

the extraction unit is used for carrying out feature extraction on the facial image sample by adopting a preset image processing model to obtain sample facial features;

A first replacing unit, configured to perform face replacement on the face image sample according to the sample facial features and the face template image sample, to obtain a replaced face image;

a reconstruction unit, configured to reconstruct the facial image sample based on the sample facial feature, to obtain a reconstructed facial image;

and the second replacing unit is used for converging the preset image processing model according to the face image sample, the replaced face image and the reconstructed face image to obtain a trained image processing model, and adopting the trained image processing model to replace the face of the face image to be processed.

Optionally, in some embodiments, the second replacing unit includes:

an acquisition subunit, configured to acquire a face image to be processed and a face template image corresponding to the face image to be processed;

an extraction subunit, configured to perform feature extraction on a to-be-processed face image and a face template image by using the trained image processing model, so as to obtain a face feature of the to-be-processed face image and a face template feature of the face template image;

the construction subunit is used for fusing the facial features and the facial template features and constructing a fused facial image based on the fused facial features;

A segmentation subunit, configured to segment the face template image and the fused face image to obtain a face area;

and the fusion subunit is used for carrying out face fusion on the fusion face image and the face template image based on the face area to obtain a target face image.

Optionally, in some embodiments, the first replacing unit may be specifically configured to perform feature extraction on the face template image sample by using the preset image processing model to obtain a face style feature; performing multi-size fusion on the facial style characteristics and the sample facial characteristics to obtain first fused sample facial characteristics under each size; and carrying out face replacement on the face template image sample based on the face characteristics of the first fused sample to obtain a replaced face image.

Optionally, in some embodiments, the first replacing unit may specifically be configured to extract a basic facial style feature at each size from facial style features, and extract a basic sample facial feature at each size from the sample facial features; calculating style feature parameters of the basic facial style features and facial feature parameters of the basic sample facial features; and fusing the basic facial style characteristics and the basic sample facial characteristics based on the style characteristic parameters and the facial characteristic parameters to obtain first fused sample facial characteristics under each size.

Optionally, in some embodiments, the first replacing unit may be specifically configured to fuse the basic facial style feature and the basic sample facial feature under the same size based on the style feature parameter and the facial feature parameter, to obtain an initial fused sample facial feature under each size; determining a face area mask corresponding to the face image sample under each size according to the initial fused sample face features, wherein the face area mask is used for indicating attention weights among the initial fused face features; and fusing the facial region mask with the initial fused sample facial features to obtain first fused sample facial features in each size.

Optionally, in some embodiments, the first replacing unit may be specifically configured to extract an associated feature from the initially fused sample facial features; determining an association weight corresponding to each initial fused sample facial feature according to the association features, wherein the association weights are used for indicating association relations between the initial fused sample facial features; and generating a face area mask corresponding to the face image sample under each size based on the association weight.

Optionally, in some embodiments, the first replacing unit may be specifically configured to calculate a variance ratio of the style feature variance and the facial feature variance; calculating a feature difference value between the feature value of the basic facial feature and the average value of the facial feature; and fusing the variance ratio, the characteristic difference and the style characteristic mean value to obtain the initial fused sample facial features under each size.

Optionally, in some embodiments, the first replacing unit may be specifically configured to generate, according to the first fused sample facial features, a basic facial image of a size corresponding to each of the first fused facial features; sorting the basic face images according to the sizes of the basic face images to obtain sorting information; and adjusting the size of the basic face image according to the ordering information to obtain a replaced face image.

Optionally, in some embodiments, the first replacing unit may be specifically configured to screen a basic face image with a smallest size from the basic face images to obtain a current basic face image; amplifying the size of the current basic face image to obtain an amplified basic face image; screening a basic face image next to the current basic face image from the basic face images according to the ordering information to obtain a target basic face image; and fusing the amplified basic face image and the target basic face image to obtain the replaced face image.

Optionally, in some embodiments, the first replacing unit may specifically be configured to fuse the enlarged basic face image and the target basic face image to obtain a fused basic face image; amplifying the size of the fused basic face image to obtain a target amplified basic face image; taking the target basic face image as the current basic face image, and taking the target amplified basic face image as the amplified basic face image; and returning to the step of screening the next basic face image of the current basic face image from the basic face images according to the sorting information until the last basic face image is screened, and fusing the screened target basic face image with the amplified basic face image to obtain a replaced basic face image.

Optionally, in some embodiments, the second replacing unit may be specifically configured to determine reconstructed perceptual loss information of the pair of face image samples according to the face image sample and the reconstructed face image; determining label perception loss information of the face image sample pair according to the face image sample and the replaced face image; fusing the reconstructed perception loss information and the label perception loss information to obtain perception loss information of the facial image sample pair; determining fight loss information for the pair of face image samples based on the replaced face image and face image samples; and fusing the perception loss information and the counterloss information, and converging the preset image processing model based on the fused loss information to obtain a trained image processing model.

Optionally, in some embodiments, the second replacing unit may be specifically configured to obtain face tag information in the face image sample, and identify a face tag area in the face image sample according to the face tag information; identifying a target face area corresponding to the face tag area in the replaced face image; extracting features of the face tag region to obtain face tag region features, and extracting features of the target face region to obtain target face region features; and calculating a characteristic difference value between the target facial region characteristic and the facial label region characteristic to obtain label perception loss information of the facial image sample pair.

Optionally, in some embodiments, the reconstructing unit may be specifically configured to perform feature extraction on the facial image sample by using the preset image processing model to obtain a sample facial style feature; performing multi-size fusion on the sample facial style characteristics and the sample facial characteristics to obtain second fused sample facial characteristics under each size; and carrying out face replacement on the face image sample based on the face characteristics of the second fused sample to obtain a reconstructed face image sample.

Optionally, in some embodiments, the second replacing unit may be specifically configured to acquire a face image to be processed and a face template image corresponding to the face image to be processed; respectively extracting features of a face image to be processed and a face template image by adopting the trained image processing model to obtain the face features of the face image to be processed and the face template features of the face template image; fusing the facial features and the facial template features, and constructing a fused facial image based on the fused facial features; performing face region segmentation on the face template image and the fused face image to obtain a face region; and carrying out face fusion on the fused face image and the face template image based on the face area to obtain a target face image.

Optionally, in some embodiments, the second replacing unit may be specifically configured to convert the facial feature into a preset number of basic facial features, and determine a convolution layer corresponding to each of the basic facial features; converting the facial template features into the preset number of basic facial template features, and determining a convolution layer corresponding to each basic facial template feature; and according to the convolution layer, fusing the basic facial features and the basic facial template features to obtain fused facial features.

Optionally, in some embodiments, the second replacing unit may be specifically configured to screen out a basic facial feature corresponding to a preset first convolution layer from the basic facial features to obtain a target basic facial feature; screening basic facial template characteristics corresponding to a preset second convolution layer from the basic facial template characteristics to obtain target basic facial template characteristics; and adopting the convolution layer to fuse the target basic facial features and the target basic facial template features to obtain fused facial features.

Optionally, in some embodiments, the second replacing unit may be specifically configured to perform multidimensional feature extraction on the fused face image to obtain facial area features with multiple dimensions; carrying out multidimensional feature extraction on the face template image to obtain template face region features with multiple dimensions; segmenting a fused face area in the fused face image according to the facial area characteristics; and dividing a template face area in the face template image according to the template face area characteristics, and taking the fused face area and the template face area as the face area.

Optionally, in some embodiments, the second replacing unit may be specifically configured to globally pool the facial region features to obtain first pooled facial region features; determining a first attention weight for each of the facial region features based on the first pooled facial region features; and according to the first attention weight, segmenting a fused face area in the fused face image.

Optionally, in some embodiments, the second replacing unit may be specifically configured to weight the facial area feature based on the first attention weight to obtain a first weighted facial area feature; global pooling of the first weighted facial region features and determining a second attention weight of the first weighted facial region features based on the pooled second pooled facial region features; and segmenting a fused face area in the fused face image according to the first weighted face area characteristic and the second attention weight.

Optionally, in some embodiments, the second replacing unit is specifically configured to weight the first weighted facial area feature based on the second attention weight to obtain a second weighted facial area feature; dividing a current face mask in the fused face image according to the first weighted face region feature and the second weighted face region feature; and identifying the region corresponding to the current facial mask in the fused facial image to obtain a fused facial region.

Optionally, in some embodiments, the second replacing unit may be specifically configured to perform color correction on the fused face image according to the fused face area and the template face area, to obtain a corrected fused face image; detecting a face key point in the face template image to obtain a first face key point, and detecting a face key point in the corrected fused face image to obtain a second face key point; and carrying out face fusion on the fused face image and the face template image according to the first face key point and the second face key point to obtain a target face image.

Optionally, in some embodiments, the second replacing unit may be specifically configured to perform color space conversion on the fused face area and the template face area to obtain a target fused face area and a target template face area in a target color space; calculating the color parameters of the target fusion face area to obtain a first color parameter, and calculating the color parameters of the target template face area to obtain a second color parameter; and correcting each color channel of the target fusion face area according to the first color parameter and the second color parameter to obtain a corrected fusion face image.

Optionally, in some embodiments, the second replacing unit may be specifically configured to calculate a variance ratio of the first pixel variance and the second pixel variance under each color channel, and calculate a pixel difference value between the current pixel value and the first pixel mean under each color channel; fusing the pixel difference value, the variance ratio and the second pixel mean value to obtain a target pixel value of each color channel; and replacing the current pixel value of each color channel of the target fusion face area with a corresponding target pixel value to obtain a corrected fusion face image.

Optionally, in some embodiments, the second replacing unit may be specifically configured to perform feature extraction on the face template image to obtain lightweight face key point features with multiple dimensions; fusing the light-weight facial key point features to obtain facial key point features; and identifying the facial key points in the facial template image according to the facial key point characteristics to obtain first facial key points.

Optionally, in some embodiments, the second replacing unit may be specifically configured to compare the first surface keypoints with the second surface keypoints to obtain keypoint deformation information; adjusting the lightweight facial key point features based on the key point deformation information; determining interpolation information for deforming the first facial key point to a second facial key point according to the adjusted key point facial features; based on the interpolation information, deforming the abnormal key points with different first face key points and second face key points to obtain a deformed face template image; and fusing the fused face image and the deformed face template image to obtain a target face image.

In addition, the embodiment of the application also provides electronic equipment, which comprises a processor and a memory, wherein the memory stores an application program, and the processor is used for running the application program in the memory to realize the image processing method provided by the embodiment of the application.

In addition, the embodiment of the application further provides a computer readable storage medium, wherein the computer readable storage medium stores a plurality of instructions, and the instructions are suitable for being loaded by a processor to execute the steps in any one of the image processing methods provided by the embodiment of the application.

After at least one facial image sample pair is obtained, carrying out feature extraction on a facial image sample by adopting a preset image processing model to obtain sample facial features, carrying out facial replacement on the facial template image sample according to the sample facial features to obtain a replaced facial image, then reconstructing the facial image sample based on the sample facial features to obtain a reconstructed facial image, then converging the preset image processing model according to the facial image sample, the replaced facial image and the reconstructed facial image to obtain a trained image processing model, and carrying out facial replacement on the facial image to be processed by adopting the trained image processing model; according to the scheme, after the feature extraction is carried out on the facial image sample, the facial image sample can be reconstructed, so that a triplet data set of the facial image sample, the facial template image sample and the reconstructed facial image is formed, and the preset image processing model is converged based on the triplet data, so that the stability and the processing precision of image processing training are greatly improved, and the accuracy of image processing can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic view of a scenario of an image processing method according to an embodiment of the present invention;

FIG. 2 is another flow chart of an image processing method according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart of cascade-amplifying a basic face image according to an embodiment of the present invention;

FIG. 4 is a schematic flow chart of face replacement of a face template image according to an embodiment of the present invention;

FIG. 5 is a schematic diagram for training a preset image processing model according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of an attention module according to an embodiment of the present invention;

FIG. 7 is a schematic flow chart of segmenting a current face mask from a fused face image according to an embodiment of the present invention;

FIG. 8 is a schematic flow chart of processing a facial template image according to an embodiment of the present invention;

FIG. 9 is a schematic flow chart of fusing a face image to be processed and a face template image according to an embodiment of the present invention;

FIG. 10 is another flow chart of an image processing method according to an embodiment of the present invention;

fig. 11 is a schematic structural view of an image processing apparatus according to an embodiment of the present invention;

fig. 12 is a schematic structural view of a second replacement unit in the image processing apparatus provided in the embodiment of the present invention;

fig. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

The embodiment of the invention provides an image processing method, an image processing device, electronic equipment and a computer readable storage medium. The image processing apparatus may be integrated in an electronic device, which may be a server or a terminal.

The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, network acceleration services (Content Delivery Network, CDN), basic cloud computing services such as big data and an artificial intelligent platform. Terminals include, but are not limited to, cell phones, computers, intelligent voice interaction devices, intelligent appliances, vehicle terminals, aircraft, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein. The embodiment of the application can be applied to various scenes, including but not limited to cloud technology, artificial intelligence, intelligent transportation, auxiliary driving and the like.

For example, referring to fig. 1, taking an example that an image processing apparatus is integrated in an electronic device, after at least one facial image sample pair is obtained, the electronic device performs feature extraction on a facial image sample in the facial image sample pair by using a preset image processing model to obtain a sample facial feature, performs face replacement on the facial template image sample according to the sample facial feature to obtain a replaced facial image, and then reconstructs the facial image sample based on the sample facial feature to obtain a reconstructed facial image, converges the preset image processing model according to the facial image sample, the replaced facial image and the reconstructed facial image to obtain a trained image processing model, and performs face replacement on the facial image to be processed by using the trained image processing model, thereby improving accuracy of image processing.

The image processing may be understood as face image processing, which may be understood as replacing a face object in a face template image, taking a face as an example, the face image processing may be face changing, which is to replace a face in a face template image with a face in a target image, and keep elements such as pose, expression, makeup, and background of the face in the face template image unchanged. It can be used in movie and television production, game entertainment and electronic commerce sales.

The image processing method provided by the embodiment of the application relates to the computer vision direction in the field of artificial intelligence. The embodiment of the application can extract the characteristics of the facial image sample, and based on the extracted sample facial characteristics, the facial template image is subjected to facial replacement and the facial image sample is reconstructed.

Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision. The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. The artificial intelligence software technology mainly comprises the directions of computer vision technology, machine learning/deep learning and the like.

The Computer Vision technology (CV) is a science for researching how to make a machine "look at", and further means that a Computer replaces a human eye to perform machine Vision such as identifying and measuring on a target, and further performs image processing, so that the image is processed by the Computer to become an image more suitable for human eye observation or transmitting to an instrument for detection. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques generally include image processing, image recognition, etc., and also include common biometric recognition techniques such as face recognition, human body posture recognition, etc.

The following will describe in detail. The following description of the embodiments is not intended to limit the preferred embodiments.

The present embodiment will be described from the viewpoint of an image processing apparatus which may be integrated in an electronic device, which may be a server or a terminal or the like; the terminal may include a tablet computer, a notebook computer, a personal computer (PC, personal Computer), a wearable device, a virtual reality device, or other devices such as an intelligent device that can perform facial replacement.

An image processing method, comprising:

obtaining at least one face image sample pair, wherein the face image sample pair comprises a face image sample and a face template image sample, performing feature extraction on the face image sample by adopting a preset image processing model to obtain sample face features, performing face replacement on the face template image sample according to the sample face features to obtain a replaced face image, reconstructing the face image sample based on the sample face features to obtain a reconstructed face image, converging the preset image processing model according to the face image sample, the replaced face image and the reconstructed face image to obtain a trained image processing model, and performing face replacement on the face image to be processed by adopting the trained image processing model.

As shown in fig. 2, the specific flow of the image processing method is as follows:

101. at least one facial image sample pair is acquired.

Wherein the facial image sample pair may include a facial image sample and a facial template image sample. The face image sample may be an image sample that provides a replacement of a face source object, and the face template image sample may be a template image sample that requires replacement of a face object with a face source object while leaving the area elements other than the face unchanged.

The manner of acquiring the facial image sample pairs may be various, and specifically may be as follows:

for example, a face image sample pair sent by the terminal may be received, where the face image sample pair includes a face image sample and a face template image sample, or a face image sample may be obtained in a network or an image database, the face image sample may be sent to a template image server, and a face template image sample corresponding to each face image sample returned by the analog image server is received, where the face image sample and the face template image sample form a face image sample pair, or an original face image may be obtained in the network or the image database, and a face template image sample corresponding to the face image sample and the face image sample may be screened from the original face image sample, so as to obtain a face image sample pair, where the screening condition may be that the face image sample and the face of the object in the face template image are different, or when the number of the face image sample pairs is more or the memory is larger, a face image processing request may also be received, where the face image sample processing request carries a storage address of the face image sample pair, and the face image sample pair is obtained according to the storage address.

102. And carrying out feature extraction on the facial image sample by adopting a preset image processing model to obtain the facial features of the sample.

The sample facial feature may be feature information characterizing a facial region in the facial image sample, and the sample facial feature may be a multi-dimensional feature vector.

The feature extraction method for the facial image sample by adopting the preset image processing model can be various, and specifically can be as follows:

for example, a feature extraction network of a preset image processing model may be used to perform convolution feature extraction on the facial image sample to obtain an initial sample facial feature, and then a full-connection layer is used to process the initial sample facial feature to obtain a 256-dimensional feature vector, and the feature vector is used as the sample facial feature.

103. And carrying out face replacement on the face template image sample according to the face characteristics of the sample to obtain a replaced face image.

The face replacement may be to replace a face area in the face template image sample with a face area in the face image sample, taking a face image as a face as an example, and then it may be understood that the face replacement or the face integration is to replace a face in the face template image sample with a face in the face image sample, and the area except the face area is kept unchanged.

The face replacement method for the face template image sample according to the face characteristics of the sample can be various, and specifically can be as follows:

for example, a preset image processing model may be used to perform feature extraction on the face template image to obtain facial style features, and the facial style features and the sample facial features are fused in multiple sizes to obtain first fused sample facial features under each size, and based on the first fused sample facial features, facial replacement is performed on the face template image sample to obtain a replaced face image.

Wherein facial style characteristics may be used to indicate feature information of a facial region style in the facial template image sample. There are various ways of performing multi-size fusion on the facial style features and the sample facial features, for example, basic facial style features under each size may be extracted from the facial style features, basic sample facial features under each size may be extracted from the sample facial features, style feature parameters of the basic facial style features and facial feature parameters of the basic sample facial features are calculated, and based on the style feature parameters and the facial feature parameters, the basic facial style features and the basic sample facial features are fused to obtain first fused sample facial features under each size.

The style feature parameters may include a style feature variance and a style feature mean, and the face feature parameters include a face feature variance and a face feature mean, so there may be various ways of calculating the style feature parameters of the basic face style feature and the face feature parameters of the basic sample face feature, for example, for the style feature parameters, a face style feature value under each feature channel in the basic face style feature may be obtained, a variance of the face style feature value is calculated, a style feature variance is obtained, and a mean of the face style feature values is calculated, and a style feature mean is obtained, where the style feature variance and the style feature mean are used as the style feature parameters. For the facial feature parameters, a facial feature value under each feature channel in the basic sample facial features can be obtained, then, the variance of the facial feature values is calculated to obtain facial feature variances, the mean of the facial feature values is calculated to obtain a facial feature mean, and the facial feature variances and the facial feature mean are taken as the facial feature parameters.

After the style characteristic parameters and the face characteristic parameters are calculated, the basic face style characteristic and the basic sample face characteristic can be fused to obtain a first fused sample face characteristic under each size, for example, based on the style characteristic parameters and the face characteristic parameters, the basic face style characteristic and the basic sample face characteristic under the same size are fused to obtain an initial fused sample face characteristic under each size, and according to the initial fused sample face characteristic, a face area mask corresponding to a face image sample under each size is determined, wherein the face area mask is used for indicating the attention weight between the initial fused face characteristics, and the face area mask is fused with the initial fused sample face characteristic to obtain the first fused sample face characteristic under each size.

The manner of fusing the basic facial style feature and the basic sample facial feature based on the style feature parameter and the facial feature parameter may be various, for example, calculating a variance ratio of the style feature variance and the facial feature variance, calculating a feature difference value of a feature value of the basic facial feature and a feature mean, and fusing the variance ratio, the feature difference value and the style feature mean to obtain an initial fused sample facial feature under each size, which may be specifically shown in formula (1):

wherein AaaIN (x, y) is the initial fused sample facial feature, σ (y) style feature variance, σ (x) is the facial feature variance, μ (x) is the facial feature mean, μ (y) is the style feature mean, x is the base sample facial feature, and y is the base facial style feature. The manner in which such features are incorporated may be adaptive instance normalization (Adaptive Instance Normalization, adaIN). The feature fusion mode is mainly aimed at the fusion of basic facial style features and basic sample facial features with the same size, and the fused first fused sample facial features are different from the first fused or sample facial features fused under other sizes. The first fused sample facial features may be the same size as the corresponding base facial style features or base sample facial features, or may be different.

After the basic facial style features and the basic sample facial features are fused under the same size, a facial area mask corresponding to the facial image sample under each size can be determined according to the initial fused sample facial features, and various manners of determining the facial area mask can be adopted, for example, the association features are extracted from the initial fused sample facial features, the association weights corresponding to the initial fused sample facial features are determined according to the association features, the association weights are used for indicating the association relation between the initial fused sample facial features, and the facial area mask corresponding to the facial image sample under each size is generated based on the association weights.

The manner of generating the face region mask corresponding to the face image sample under each size may be various based on the association weight, for example, the pixel value of each pixel of the face image sample may be weighted based on the association weight, so that the target face region is segmented in the weighted pixel value, and the target face region is converted into the face region mask, or the feature value under each feature channel of the sample face feature of the face image sample may be weighted based on the association weight, and the target face region is identified in the face image sample based on the weighted sample face feature, and the target face region is converted into the face region mask.

After determining the facial region mask corresponding to the facial image sample under each size, the facial region mask and the facial features of the sample after initial fusion can be fused in various manners, for example, the facial features of the sample after initial fusion can be directly multiplied by the facial region mask, so as to obtain the facial features of the first sample after fusion under each size, or the feature value corresponding to each pixel in the facial region mask can be further multiplied by the feature of the corresponding pixel position in the facial features of the sample after initial fusion, so as to obtain the facial features of the sample after first fusion under each size.

It should be noted that, the manner of extracting the relevant features, determining the relevant weights, generating the facial region mask, and fusing the facial region mask with the initially fused sample facial features may be an attention mask mechanism, so as to pay more attention to the features of the facial region of the sample facial features in the process of fusing the facial style features and the sample facial features, thereby improving the accuracy of facial replacement.

After the first fused sample facial features are obtained, a facial replacement may be performed on the facial template image, thereby obtaining a replaced facial image. The face template image may be replaced by various face modes, for example, a basic face image with a size corresponding to each first fused face feature is generated according to the first fused sample face features, the basic face images are ordered according to the size of the basic face image to obtain ordering information, and the size of the basic face image is adjusted to obtain a replaced face image.

The manner of generating the basic facial image of the size corresponding to each first fused facial feature may be various according to the first fused sample facial features, for example, an image operator (topgb) may be used to directly convert the first fused sample facial features into the basic facial image of the size corresponding to the first fused facial features, or an image operator (topgb) may be used to convert the first fused sample facial features into pixel features of an RGB color space, and based on the pixel features, a basic facial image of the size corresponding to the first fused sample facial features may be generated.

The size of the basic face image is adjusted according to the sorting information to obtain a replaced face image, for example, the basic face image with the smallest size can be selected from the basic face images to obtain a current basic face image, the size of the current basic face image is amplified to obtain an amplified basic face image, the next basic face image of the current basic face image is selected from the basic face images according to the sorting information to obtain a target basic face image, and the amplified basic face image and the target basic face image are fused to obtain the replaced face image.

The size of the current basic face image may be enlarged in various manners, for example, the current basic face image may be up-sampled, so as to enlarge the size of the current basic face image, and obtain an enlarged basic face image, or the size of the current basic face image is compared with a preset target size, an enlargement parameter is determined based on a comparison result, and the current basic face image is enlarged to the preset target size based on the enlargement parameter, so as to obtain an enlarged basic face image.

The method includes the steps of fusing an amplified basic face image and a target basic face image, for example, the amplified basic face image and the target basic face image can be fused to obtain a fused basic face image, the size of the fused basic face image is amplified to obtain a target amplified basic face image, the target basic face image is taken as a current basic face image, the target amplified basic face image is taken as an amplified basic face image, the step of screening a basic face image next to the current basic face image in the basic face image according to sorting information is carried out, and the screened target basic face image is fused with the amplified basic face image until the last basic face image is screened to obtain a replaced basic face image.

The method of fusing the amplified basic face image and the target basic face image may be various, for example, the amplified basic face image is directly superimposed on the target basic face image, so as to obtain a fused basic face image, or each pixel value of the amplified basic face image in the RGB color space is fused with a pixel value of a pixel at a position corresponding to the target basic face image, so as to obtain a fused basic face image. By the cascade amplification structure, in the up-sampling process, the basic face image is amplified and added to the basic face image of the next size until the replaced face image of the target size is finally obtained, which can be shown in fig. 3.

Taking a face image as a face image, taking a sample face image as a target face image, taking the face template image as a face template image as an example, carrying out face replacement on the face template image by adopting a face encoder to carry out identity feature extraction on the target face image to obtain face features, adopting feature fusion to carry out style feature extraction on the face template image to obtain style features, fusing the style features and the face features through an AdaIN network in a feature fusion device to obtain initial fused face features, adopting an attention mask mechanism to fuse the initial fused face features with a face area mask to obtain fused face features, then generating a basic face image on the fused face features, carrying out upsampling on the basic face image, adopting cascade amplification to amplify the basic face image in the upsampling process to obtain a replaced face image with a target size, and particularly being shown in fig. 4.

104. And reconstructing the facial image sample based on the sample facial features to obtain a reconstructed facial image.

For example, a preset image processing model may be used to perform feature extraction on the facial image sample to obtain a sample facial style feature, and the sample facial style and the sample facial feature are fused in multiple sizes to obtain a second fused sample under each size, and based on the second fused sample facial feature, the facial image sample is subjected to facial replacement to obtain a reconstructed facial image sample.

The manner of multi-size fusion of the sample facial style features and the sample facial features is the same as the manner of multi-size fusion of the facial style features and the sample facial features, which is described in detail above, and will not be described in detail here. In addition, based on the facial features of the second fused sample, the facial replacement of the facial image sample is performed in the same manner as the facial replacement of the facial template image sample based on the facial features of the first fused sample, which is described in detail above, and will not be described again.

105. And converging the preset image processing model according to the face image sample, the replaced face image and the reconstructed face image to obtain a trained image processing model, and replacing the face of the face image to be processed by adopting the trained image processing model.

The convergence of the preset image processing model according to the face image sample, the replaced face image and the reconstructed face image may be performed in various manners, and specifically may be as follows:

for example, the perceived loss information of the pair of face image samples is determined from the face image sample, the replaced face image and the reconstructed face head image, the counterloss information of the pair of face image samples is determined based on the replaced face image and the face image sample, the perceived loss information and the counterloss information are fused, and the preset image processing model is converged based on the fused loss information, so that the trained image processing model is obtained.

The perceptual loss information may be understood as loss information of a perceptual level of a face region between the face image sample and the replaced face image, and between the face image sample and the reconstructed face image. There may be various ways of determining the perceived loss information of the pair of face image samples from the face image sample, the replaced face image, and the reconstructed face image, for example, the reconstructed perceived loss information of the pair of face image samples may be determined from the face image sample and the reconstructed face image, the tag perceived loss information of the pair of face image samples may be determined from the face image sample and the replaced face image, and the reconstructed perceived loss information and the tag perceived loss information may be fused to obtain the perceived loss information of the pair of face image samples.

The method for determining the label perception loss information of the face image sample pair according to the face image sample and the replaced face image may be various, for example, the face label information in the face image sample may be obtained, the face label area is identified in the face image sample according to the face label information, the target face area corresponding to the face label area is identified in the replaced face image, the feature extraction is performed on the face label area to obtain the face label area feature, the feature extraction is performed on the target face area to obtain the target face area feature, the feature difference between the target face area feature and the face label area feature is calculated, and the label perception loss information of the face image sample pair is obtained.

The feature extraction of the facial tag region and the target facial region, and the calculation of the feature difference between the extracted target facial region feature and the facial tag region feature may be performed by a VGG (an image recognition network) network pre-trained on the facial recognition task.

The reconstructed perceptual loss information is used for indicating loss information of a perceptual layer of a face region between the face image sample and the reconstructed face image, and a VGG network can be used for extracting features of the face image sample and the face region of the reconstructed face image respectively and calculating feature difference values of the extracted features so as to obtain the reconstructed perceptual loss information.

After the reconstruction loss information and the tag perception loss information are determined, the reconstruction loss information and the tag perception loss information can be fused, and various fusion modes can be adopted, for example, the reconstruction loss information and the tag perception loss information can be directly added, so that the perception loss information of the facial image sample pair is obtained, and the method specifically can be as shown in a formula (2):

L _perceptial ＝||VGG(G(X _src ，X _id ))-VGG(X _dst )||+||VGG(G(X _id ，X _id ))-VGG(X _id )|| (2)

wherein L is _perceptial VGG is a pretrained VGG network on face recognition task for perceiving loss information, X _src For a facial template image sample, X _id For a facial image sample, G generator, G (X _src ，X _id ) To replace the rear face image, X _dst For labels in facial image samples, G (X _id ，X _id ) To reconstruct a rear face image.

Optionally, a preset loss weight may be further obtained, based on the preset loss weight, the reconstructed perceptual loss information and the tag perceptual loss information are weighted respectively, and the weighted reconstructed perceptual loss information and the tag perceptual loss information are fused to obtain the perceptual loss information of the facial image sample pair.

While determining the perceived loss information, the counterloss information of the face image sample pair may be determined, and the manner of determining the counterloss information of the face image sample pair may be various, for example, a discriminator may be trained, and the counterloss of the face image sample and the replaced face image sample may be calculated by using the trained face discriminator and the facial feature auxiliary discriminator, so as to obtain the counterloss information, which may be specifically shown in formula (3):

L _adv (G，D _s )＝∑log D _s (X _dst )+∑(1-log D _s (G(X _src ，X _id ))) (3)

Wherein L is _adv (G，D _s ) To combat lost information, D _s G is a facial discriminator and a five sense organs auxiliary discriminator, X is a generator _dst Label, X in facial image sample _src For a facial template image sample, X _id For the facial image sample, G (X _src ，X _id ) To replace the rear face image.

After the perceptual loss information and the counterloss information are calculated, the perceptual loss information and the counterloss information may be fused in various manners, for example, the perceptual loss information and the counterloss information may be directly added to obtain fused loss information, or a preset weighting parameter may be obtained, the perceptual loss information and the counterloss information may be weighted based on the preset weighting parameter, and the weighted perceptual loss information and the weighted counterloss information may be fused to obtain fused loss information.

After the perceptual loss information and the counterloss information are fused, the preset image processing model can be converged based on the fused loss information, and various modes for converging the preset image processing model can be adopted, for example, a gradient descent algorithm can be adopted, and the network parameters of the preset image processing model are updated according to the fused loss information until the convergence is completed, so that the trained image processing model is obtained.

Taking a face image as an example, the training process of the preset image processing model can be to input a target image into a face coding network to obtain a target image feature vector, input a template image and the feature vector into a feature fusion network, wherein the feature fusion network is similar to a U-Net structure network, adaptive Instance Normalization (adaptive example normalization) is used in a Residual block (Residual module), a cascade amplification structure and an attention mask mechanism are used in an up-sampling stage, so that a replaced face image is obtained, then, the target image and the feature vector are input into the feature fusion network to obtain a reconstructed target image, and when the preset image processing model is trained, a VGG network pre-trained on a face recognition task is used for respectively obtaining a result image and a label, and the target image and the reconstructed target image are used for obtaining perception losses; when training the discriminators, the countermeasures are obtained by using the face discriminators and the facial auxiliary discriminators. Finally, the loss function is trained by gradient descent, so that a trained image processing model is obtained, and the training image processing model can be specifically shown as a figure 5.

After training the preset image processing model, the face image to be processed can be replaced based on the trained image processing model, for example, the face image to be processed and the face template image corresponding to the face image to be processed can be obtained in various face replacement modes, the feature extraction is respectively carried out on the face image to be processed and the face template image by adopting the trained image processing model, the face features of the face image to be processed and the face template features of the face template image are obtained, the face features and the face template features are fused and the fused face features are combined to construct a fused face image, the face template image and the fused face image are subjected to face region segmentation to obtain face regions, and the fused face image and the face template image are subjected to face fusion based on the face regions to obtain a target face image, which can be specifically as follows:

S1, acquiring a face image to be processed and a face template image corresponding to the face image to be processed.

The face image may be an image including a face region of the subject, and the face template image corresponding to the face image may be a template image in which the face region in the face image is subjected to face replacement. In the face replacement process, the face region in the face template image may be replaced with the face region in the face image.

The method for obtaining the face image to be processed and the face template image corresponding to the face image to be processed may be various, and may specifically be as follows:

for example, the face image to be processed and the face template image sent by the terminal may be directly obtained, or the face image to be processed may be obtained, a face template image corresponding to the face image to be processed may be screened from a face analog image database, or an original face image may be obtained from a network or an image database, a face image pair may be screened from the original face image, any one of the face images in the face image pair is taken as the face image to be processed, the other face image in the face image pair is taken as the face template image, or when the number of the face images to be processed and the face template image is greater or the memory is greater, a face image processing request may be received, where the face image processing request carries storage addresses of the face image to be processed and the face template image corresponding to the face image to be processed, and the face template image corresponding to the face image to be processed is obtained according to the storage addresses.

S2, respectively extracting features of the face image to be processed and the face template image by adopting the trained image processing model to obtain the face features of the face image to be processed and the face template features of the face template image.

Wherein the facial features are used for indicating the feature information of the style of the facial region of the facial image to be processed, and the facial template features are used for indicating the feature information of the style of the facial region of the facial template image.

The feature extraction method for the face image to be processed and the face template image by adopting the trained image processing model can be various, and specifically can be as follows:

for example, a face style feature extraction network of the trained image processing model may be used to perform face style feature extraction on the face image to be processed to obtain a face feature, and a face style feature extraction network may be used to perform face style feature extraction on the face template image to obtain a face template feature.

The face style feature extraction method may be various, for example, a face style (stylegan) encoder may be used to perform face encoding on a to-be-processed face image to obtain a 512-dimensional face vector, the face vector is used as a face feature, and a stylegan encoder is used to perform face encoding on a face template image to obtain a 512-dimensional face template vector, and the face template vector is used as a face template feature.

S3, fusing the facial features and the facial template features, and constructing a fused facial image based on the fused facial features.

The fused facial image may be a facial image generated from the fused facial features fused based on the facial features and the facial template features.

The way of fusing the facial features and the facial template features may be various, and specifically may be as follows:

for example, the facial features are converted into a preset number of basic facial features, convolution layers corresponding to each basic facial feature are determined, the facial template features are converted into a preset number of basic facial template features, the convolution layers corresponding to each basic facial template feature are determined, and the basic facial features and the basic facial template features are fused according to the convolution layers to obtain fused facial features.

There are various ways to convert the facial features into the preset number of basic facial features, for example, mapping network (a mapping network) may be used to map the 512-dimensional facial vectors into 14×512-dimensional facial vectors (256×256 resolution), so as to obtain 14 basic facial features. After converting the basic facial features, 14 basic facial features can be input as noise of 14 convolution layers, respectively, so that the convolution layer corresponding to each basic facial feature can be determined.

The method for converting the facial template features into basic facial template features and determining the convolution layer corresponding to each basic facial template feature is the same as the method for converting the basic facial features and determining the corresponding convolution layer, which are described in detail above, and will not be described in detail here.

After the basic facial features and the basic facial template features are converted, the basic facial features and the basic facial template features can be fused based on the basic facial features and the convolution layers corresponding to the basic facial template features, and various fusion modes can be adopted, for example, the basic facial features corresponding to the preset first convolution layer are screened out of the basic facial features to obtain target basic facial features, the basic facial template features corresponding to the second convolution layer are screened out of the basic facial template features to obtain target basic facial template features, and the convolution layers are adopted to fuse the target basic facial features and the target basic facial template features to obtain fused facial features.

Wherein the features of each convolution layer differ in at least the features of the facial region, the noise input of the stylegan different convolution layers will have a different effect on the result. The bottom noise vector mainly influences the features of face shape, orientation, position and the like of the generated face; the middle-layer noise vector mainly affects the outline and shape of the five sense organs of the human face; the higher level noise vector mainly affects the color, texture and illumination of the generated face. For example, taking the number of convolution layers as 14 as an example, 1-3 layers may indicate facial position orientation, 4-12 layers may indicate facial contour shape, and 13-14 may indicate color, texture, and illumination. In the face fusion process, the face position orientation of the fused face image can be similar to that of a face template image, the outline shape of the five sense organs of the fused face image is similar to that of the face image to be processed, and the color, texture and illumination of the fused face image are similar to those of the face template image, so that the preset first convolution layer can be the 4 th layer to the 12 th layer, and the preset second convolution layer can be the 1 layer to the 3 layers and the 13 layer to the 14 layers. When the number of the convolution layers is not 14 layers, the preset first convolution layer and the preset second convolution are determined with the type of the facial feature indicated by each convolution layer being different.

After the target basic facial features and the target basic facial template features are screened out, the target basic facial features and the target basic facial template features can be fused in various modes, for example, the target basic facial features and the target basic facial template features can be fused layer by layer, a convolution layer is taken as 14 layers, the characteristics of 1-3 layers of the fused rear facial features are the target basic facial template features of 1-3 layers, the characteristics of 4-12 layers of the fused rear facial features are the target basic facial features of 4-12 layers, and the characteristics of 13-14 layers of the fused rear facial features are the target basic facial template features of 13-14 layers, so that the fused rear facial features are obtained.

After the facial features and the facial template features are fused, a fused facial image can be constructed based on the fused facial features, and various ways of constructing the fused facial image can be used, for example, the fused facial features can be input to a stylegan generator, so as to obtain the fused facial image.

S4, carrying out face region segmentation on the face template image and the fused face image to obtain a face region.

The face region may be a region indicating a face in the face template image and the fused face image.

The area segmentation method for the facial template image and the fused facial image may be various, and specifically may be as follows:

for example, the multi-dimensional feature extraction is performed on the fused face image to obtain multi-dimensional face region features, the multi-dimensional feature extraction is performed on the face template image to obtain multi-dimensional template face region features, the fused face region is segmented in the fused face image according to the face region features, the template face region is segmented in the face template image according to the template face region features, and the fused face region and the template face region are taken as face regions.

The method includes that according to facial region characteristics, a fused facial region is segmented in a fused facial image, for example, global pooling is performed on the facial region characteristics to obtain first pooled facial region characteristics, a first attention weight of each facial region characteristic is determined based on the first pooled facial region characteristics, and according to the first attention weights, the fused facial region is segmented in the fused facial image.

The manner of determining the first attention weight of each facial region feature based on the first pooled facial region features may be various, for example, the first pooled facial region features may be subjected to a full-connection layer and activation function processing, so as to obtain the first attention weight of each feature channel of each facial region feature. Determining the first Attention weight is mainly done using an Attention block (Attention block) comprising a Global pooling layer (Global pooling) and a convolution layer (conv 1 x 1), as shown in fig. 6.

After determining the first attention weight of each facial region feature, the fused facial region may be segmented in the fused facial image according to the first attention weight, and the fused facial region may be segmented in a plurality of ways, for example, the facial region features may be weighted based on the first attention weight to obtain first weighted facial region features, the second attention weight of the first weighted facial region features may be determined, and the fused facial region may be segmented in the fused facial image according to the first weighted facial region features and the second attention weight.

The second Attention weight determining method of the first weighted face area feature is the same as the first Attention weight determining method, and is implemented by using an Attention module (Attention block), which is described in detail above and will not be repeated here. After determining the second attention weight, the fused face region may be segmented in the fused face image according to the first weighted face region feature and the second attention weight, and the fused face region may be segmented in a plurality of ways, for example, the first weighted face region feature may be weighted based on the second attention weight to obtain the second weighted face region feature, the current face mask may be segmented in the fused face image according to the first weighted face region feature and the second weighted face region feature, and the region corresponding to the current mask may be identified in the fused face image to obtain the fused face region.

The method for segmenting the current facial mask in the fused facial image according to the first weighted facial region feature and the second weighted facial region feature may be multiple, for example, the first weighted facial region feature and the second weighted facial region feature are fused to obtain a first fused facial region feature, the first weighted facial region feature and the facial region feature are fused to obtain a second fused facial region feature, the first fused facial region feature and the second fused facial region feature are decoded respectively to obtain a first decoded facial region feature and a second decoded facial region feature, the first decoded facial region feature and the second decoded facial region feature are fused to obtain a third fused facial region feature, the third fused facial region feature is decoded to obtain a third decoded facial region feature, and the mask of the facial region is segmented in the fused facial image according to the third fused facial region feature, which may be specifically shown in fig. 7.

After the current face mask is segmented, the region corresponding to the current face mask can be identified in the fused face image, and various identification modes can be adopted, for example, the current face mask and the fused face image are overlapped, so that the region corresponding to the current face mask can be identified, the region corresponding to the current face mask is taken as the fused face region, or the position information of the current face mask is acquired, and the region corresponding to the position information is identified in the fused face image according to the position information, so that the fused face region is obtained.

The process of dividing the template face area in the face template image according to the template face area features is the same as the process of dividing the fused face area in the fused face image, and detailed description is omitted here. The fused face region and the template face region are taken as face regions.

S5, based on the face area, carrying out face fusion on the fusion face image and the face template image to obtain a target face image.

For example, color correction may be performed on the fused face image according to the fused face region and the template face region to obtain a corrected fused face image, face key points may be detected in the face template image to obtain first face key points, and face key points may be detected in the corrected fused face image to obtain second face key points, and the fused face image and the face template image may be subjected to face fusion according to the first face key points and the second face key points to obtain the target face image.

The method for performing color correction on the fused face image according to the fused face area and the template face area may include, for example, performing color space conversion on the fused face area and the template face area to obtain a target fused face area and a target template face area in a target color space, calculating color parameters of the target fused face area to obtain a first color parameter, calculating color parameters of the target template face area to obtain a second color parameter, and correcting each color channel of the target fused face area according to the first color parameter and the second color parameter to obtain a corrected fused face image.

The method of performing color space conversion on the fused face area and the template face area may be various, for example, the three-channel fused face area and the three-channel template face area may be converted from RGB color space to LAB color space, and the LAB color space is used as the target color space, so as to obtain the target fused face area and the target template face area in the LAB color space.

After the fusion face region and the template face region are converted from the RGB color space to the LAB color space, the first color parameter corresponding to the converted target fusion face region and the second color parameter corresponding to the target template face region may be calculated. The first color parameter includes a first pixel mean and a first pixel variance, and the second color parameter includes a second pixel mean and a second pixel variance. There may be various ways to calculate the parameters of the first color, for example, a current pixel value of each color channel of the target fusion face area in the LAB color space is obtained, and then the current pixel value and the mean value and the variance are calculated, so as to obtain a first pixel mean value and a first pixel variance of each color channel, and the first pixel mean value and the first pixel variance are used as the first color parameters. The manner of calculating the second color parameter is the same as that of calculating the first color parameter, and will not be described in detail here.

After the first color parameter and the second color parameter are calculated, each color channel of the target fusion face area can be corrected according to the first color parameter and the second color parameter to obtain a corrected fusion face image, the correction mode can be various, for example, the variance ratio of the first pixel variance and the second pixel variance under each color channel is calculated, the pixel difference value between the current pixel value and the first pixel mean under each color channel is calculated, the pixel difference value, the variance ratio and the second pixel mean are fused to obtain a target pixel mean of each color channel, taking the color channel including L/A/B as an example, and the calculation process of the target pixel mean of each color channel can be as shown in formula (4):

wherein l' ₁ 、a′ ₁ And b' ₁ Respectively the target pixel values, L under the L/A/B color channels ₁ 、a ₁ And b ₁ The current pixel value under the L/a/B color channel respectively,and->Respectively the first pixel mean value under the L/A/B color channel,and->Second pixel mean value under L/A/B color channel, respectively,/I>And->Respectively L/A/B colorFirst pixel variance under the track, +.>And->The second pixel variance under the L/a/B color channel, respectively.

After the target pixel value of each color channel is calculated, the current pixel value of each color channel of the target fused face region may be replaced with a corresponding target pixel value, thereby obtaining a corrected fused face image.

After the corrected fused face image is obtained, the face key points can be detected in the face template image to obtain the first face key points, and various face key point detection modes can be adopted, for example, a key point detection network of a trained image processing model can be adopted to perform feature extraction on the face template image to obtain light-weight face key point features with multiple dimensions, the light-weight face key point features are fused to obtain face key point features, and the face key points are identified in the face template image according to the face key point features to obtain the first face key points.

The method for detecting the facial key points in the corrected fused facial image is the same as the process for detecting the key points in the facial template image, and is described in detail above, and will not be repeated here.

After the first facial key point and the second facial key point are detected, the face fusion can be performed on the fused face image and the face template image, and various face fusion modes can be performed, for example, the first facial key point and the second facial key point are compared to obtain key point deformation information, the first facial key point is deformed based on the key point deformation information to obtain a deformed face template image, and the fused face image and the deformed face template image are fused to obtain a target face image.

The method for deforming the first facial key point based on the key point deformation information may be various, for example, the lightweight facial key point feature is adjusted based on the key point deformation information, interpolation information for deforming the first facial key point to the second facial key point is determined according to the adjusted key point facial feature, and abnormal key points different from the first facial key point and the second facial key point are deformed based on the interpolation information, so as to obtain the deformed facial template image.

The method for deforming the abnormal key with the difference between the first face key and the second face key based on the difference information may be various, for example, the method for deforming the abnormal key based on the difference information by adopting a thin plate spline difference method, so as to obtain a deformed face template image.

The processing of the face template image may be regarded as a process of continuing to deform the first face keypoint after the first face keypoint is detected, specifically, as shown in fig. 8, the face template image may be subjected to keypoint feature extraction by using a mobilet-V2 Block (a lightweight convolutional neural network), the mobilet-V2 Block may be deployed on a mobile terminal, then, the lightweight face keypoint feature is extracted, then, the lightweight face keypoint feature is fused by using a convolutional layer to obtain a face keypoint feature, the face keypoint feature is normalized by using a full-connection layer (FC layer), and the face keypoint in the face template image is determined based on the normalized face keypoint feature, thereby obtaining the first face keypoint. After the first face key points are detected, the first face key points are required to be deformed, and the deformation process is mainly completed in a thin plate spline interpolation mode, so that a deformed face template image is obtained.

After the first face key point is deformed, the fused face image and the deformed face template image can be fused, and various fusion modes can be adopted, for example, the fused face image and the deformed face template image are subjected to mixed gradient fusion (poisson fusion), so that the target face image is obtained.

It should be noted that, the process of performing face fusion on the face image to be processed and the face template image may be mainly regarded as two parts, one is to perform feature extraction on the face image to be processed and the face template image respectively, fuse the extracted features layer by layer, reconstruct the fused face features into a fused face image based on the stylegan generator, and the other is to perform face fusion on the fused face image and the face template image, where the main process of fusion may be understood as separating the face areas of the fused face image and the face template image respectively, identifying the face key points of the fused face image and the face template image, deforming the face template image based on the identified face key points, and finally performing mixed gradient fusion on the deformed face template image and the fused face image, thereby obtaining the target face image, as shown in fig. 9.

When the face image is taken as a face image for example, the method synthesizes a large number of paired template face-target face-reconstructed face triplet data for training by using a StyleGan hidden space vector fusion technology. On the network structure, a set of end-to-end networks of paired data is designed based on pixel2 pixel. The generator consists of a face coding network and a feature fusion network, adopts self-adaptive instance normalization to perform feature fusion, and increases the generation details of the cascade amplifying structure. Before training, the positions of the five sense organs in the drawing are marked by using a face key point tool, and during training, the eyes, the nose and the mouth areas are respectively trained by using a five sense organ auxiliary discriminator, so that details of each part are enhanced. In addition, adaptive data amplification is adopted during training, so that the discrimination capability of the discriminator is effectively enhanced. As paired data are adopted, strong supervision information is provided, and the bad case is stable and not easy to appear in the training process. In addition, the lightweight network can be obtained for deployment on the mobile device by compressing the parameters and calculation amount of the generator. In addition, when the face of the object in the face image to be processed is shielded by glasses or other objects, a better stylegan model can be adopted for synthesis, and a better ternary data set can be obtained after more strict manual selection, and a better image processing model can be obtained through training.

Optionally, in an embodiment, the image processing method further includes storing the replaced face image, the reconstructed face image, the fused face image, the target face image, and the like to a blockchain.

As can be seen from the above, in the embodiment of the present application, after obtaining at least one facial image sample pair, performing feature extraction on a facial image sample by using a preset image processing model to obtain a sample facial feature, performing facial replacement on a facial template image sample according to the sample facial feature to obtain a replaced facial image, reconstructing the facial image sample based on the sample facial feature to obtain a reconstructed facial image, and then converging the preset image processing model according to the facial image sample, the replaced facial image and the reconstructed facial image to obtain a trained image processing model, and performing facial replacement on the facial image to be processed by using the trained image processing model; according to the scheme, after the feature extraction is carried out on the facial image sample, the facial image sample can be reconstructed, so that a triplet data set of the facial image sample, the facial template image sample and the reconstructed facial image is formed, and the preset image processing model is converged based on the triplet data, so that the stability and the processing precision of image processing training are greatly improved, and the accuracy of image processing can be improved.

According to the method described in the above embodiments, examples are described in further detail below.

In this embodiment, the image processing apparatus is specifically integrated in an electronic device, the electronic device is a server, and a face is a face.

As shown in fig. 10, an image processing method specifically includes the following steps:

201. the server obtains at least one face image sample pair.

For example, the server may receive a pair of face image samples sent by the terminal, where the pair of face image samples includes a face image sample and a face template image sample, or may obtain a face image sample from a network or an image database, send the face image sample to the template image server, and receive a face template image sample corresponding to each face image sample returned by the analog image server, and form a pair of face image samples from the face image sample and the face template image sample, or may obtain an original face image from the network or the image database, and screen out a face image sample corresponding to the face image sample and a face template image sample from the original face image sample, so as to obtain a pair of face image samples, where the screening condition may be that the faces of objects in the face image sample and the face template image are different, or when the number of the pair of face image samples is greater or the memory is greater, may also receive a face image processing request, where the face image processing request carries a pair of face image samples, and the face image processing request is stored according to the storage of the pair of face image samples, and the obtained pair of face image samples is stored according to the storage.

202. And the server adopts a preset image processing model to extract the characteristics of the face image sample, so as to obtain the characteristics of the sample face.

For example, the server may perform convolution feature extraction on the face image sample by using a feature extraction network of a preset image processing model to obtain an initial sample face feature, and then process the initial sample face feature by using a full connection layer to obtain a 256-dimensional feature vector, and use the feature vector as the sample face feature.

203. And the server replaces the face of the face template image sample according to the face characteristics of the sample, and a replaced face image is obtained.

For example, the server may perform feature extraction on the face template image by using a preset image processing model to obtain face style features, extract basic face style features under each size from the face style features, extract basic sample face features under each size from the sample face features, calculate style feature parameters of the basic face style features and face feature parameters of the basic sample face features, and fuse the basic face style features and the basic sample face features based on the style feature parameters and the face feature parameters to obtain first fused sample face features under each size.

For the style characteristic parameters, the server can acquire the face style characteristic value under each characteristic channel in the basic face style characteristic, calculate the variance of the face style characteristic value to obtain the style characteristic variance, calculate the mean value of the face style characteristic value to obtain the style characteristic mean value, and take the style characteristic variance and the style characteristic mean value as the style characteristic parameters. For the face feature parameters, face feature values under each feature channel in the basic sample face feature can be obtained, then, the variance of the face feature values is calculated to obtain face feature variances, the average value of the face feature values is calculated to obtain face feature average values, and the face feature variances and the face feature average values are used as the face feature parameters.

The server calculates a variance ratio of the style characteristic variance and the face characteristic variance, calculates a characteristic difference value of a characteristic value of the basic face characteristic and a characteristic mean value of the face characteristic, and fuses the variance ratio, the characteristic difference value and the style characteristic mean value to obtain an initial fused sample face characteristic under each size, wherein the initial fused sample face characteristic can be specifically shown as a formula (1).

The server extracts the association feature from the initial fused sample face feature, determines the association weight corresponding to the initial fused sample face feature according to the association feature, weights the pixel value of each pixel of the face image sample based on the association weight, so as to cut out the target face area in the weighted pixel value and convert the target face area into a face area mask, or weights the feature value under each feature channel of the sample face feature of the face image sample based on the association weight, and identifies the target face area in the face image sample based on the weighted sample face feature and converts the target face area into the face area mask. The face features of the initial fused sample are multiplied by the face region mask, so that the face features of the first fused sample under each size are obtained, or alternatively, the feature value corresponding to each pixel in the face region mask can be multiplied by the feature of the corresponding pixel position in the face features of the initial fused sample, so that the face features of the first fused sample under each size are obtained.

The server may directly convert the first fused sample face feature into a basic face image with a size corresponding to the first fused face feature by using an image operator (topgb), or may also convert the first fused sample face feature into a pixel feature of an RGB color space by using an image operator (topgb), and generate a basic face image with a size corresponding to the first fused sample face feature based on the pixel feature. Sorting the basic face images according to the sizes of the basic face images, screening out the basic face image with the smallest size from the basic face images to obtain a current basic face image, up-sampling the current basic face image to amplify the size of the current basic face image to obtain an amplified basic face image, or comparing the size of the current basic face image with a preset target size, determining an amplification parameter based on a comparison result, and amplifying the current basic face image to the preset target size based on the amplification parameter to obtain the amplified basic face image.

The server screens out the next basic face image of the current basic face image from the basic face images according to the sorting information to obtain a target basic face image, fuses the amplified basic face image and the target basic face image to obtain a fused basic face image, amplifies the size of the fused basic face image to obtain a target amplified basic face image, takes the target basic face image as the current basic face image, takes the target amplified basic face image as the amplified basic face image, and returns to execute the step of screening out the next basic face image of the current basic face image from the basic face images according to the sorting information until the last basic face image is screened out, and directly overlaps the amplified basic face image on the target basic face image to obtain the fused basic face image, or fuses each pixel value of the amplified basic face image in the RGB color space with the pixel value of the corresponding position pixel of the target basic face image to obtain the fused basic face image. And amplifying the basic face image and adding the basic face image into the basic face image of the next size in the up-sampling process through a cascade amplification structure until the replaced face image of the target size is finally obtained.

204. The server reconstructs the face image sample based on the sample face characteristics to obtain a reconstructed face image.

For example, the server may perform feature extraction on the face image sample by using a preset image processing model to obtain a sample face style feature, perform multi-size fusion on the sample face style and the sample face feature to obtain a second fused sample under each size, and perform face replacement on the face image sample based on the second fused sample face feature to obtain a reconstructed face image sample.

205. And the server converges the preset image processing model according to the face image sample, the replaced face image and the reconstructed face image to obtain a trained image processing model.

For example, the server is used for acquiring face tag information in a face image sample, identifying a face tag area in the face image sample according to the face tag information, identifying a target face area corresponding to the face tag area in the face image after replacement, performing feature extraction on the face tag area by adopting a VGG network to obtain face tag area features, performing feature extraction on the target face area by adopting the VGG network to obtain target face area features, calculating feature difference values between the target face area features and the face tag area features, and obtaining tag perception loss information of the face image sample pair. And adopting a VGG network to respectively extract the characteristics of the face image sample and the face region of the reconstructed face image, and calculating the characteristic difference value of the extracted characteristics, thereby obtaining the reconstructed perception loss information. And adding the reconstructed perception loss information and the label perception loss information to obtain the perception loss information of the human face image sample pair, wherein the perception loss information can be specifically shown as a formula (2).

The server can train the discriminant, calculate the countermeasures loss of the face image sample and the face image sample after replacement by adopting the trained face discriminant and the facial feature auxiliary discriminant, so as to obtain the countermeasures loss information, and the countermeasures loss information can be specifically shown as a formula (3). The perceived loss information and the counterloss information are added to obtain fused loss information, or alternatively, a preset weighting parameter may be obtained, the perceived loss information and the counterloss information are weighted based on the preset weighting parameter, and the weighted perceived loss information and the weighted counterloss information are fused to obtain fused loss information. And updating network parameters of the preset image processing model according to the fused loss information by adopting a gradient descent algorithm until convergence is completed, so as to obtain the trained image processing model.

206. The server acquires a face image to be processed and a face template image corresponding to the face image to be processed.

For example, the server may directly acquire the face image to be processed and the face template image sent by the terminal, or may also acquire the face image to be processed, screen the face template image corresponding to the face image to be processed in the face analog image database, or acquire the original face image from the network or the image database, screen the face image pair from the original face image, and take any one face image in the face image pair as the face image to be processed, then the other face image in the face image pair is taken as the face template image, or, when the number of the face images to be processed and the face template images is more or the memory is larger, may also receive a face image processing request, where the face image processing request carries the storage addresses of the face image to be processed and the face template image to be processed, and acquire the face template image corresponding to the face image to be processed and the face image to be processed according to the storage addresses.

207. And the server adopts the trained image processing model to respectively extract the characteristics of the face image to be processed and the face template image to obtain the face characteristics of the face image to be processed and the face template characteristics of the face template image.

For example, the server may perform face coding on the face image to be processed by using a stylegan encoder to obtain 512-dimensional face vectors, use the face vectors as face features, perform face coding on the face template image by using the stylegan encoder to obtain 512-dimensional face template vectors, and use the face template vectors as face template features.

208. The server fuses the face features and the face template features and constructs a fused face image based on the fused face features.

For example, the server maps the 512-dimensional face vector and the face template vector into a 14×512-dimensional basic face vector and a basic face template vector (256×256 resolution) by using a mapping network, so as to obtain 14 basic face features and basic face template features. After the basic face features and the basic face template features are converted, the 14 basic face features and the 14 basic face template features can be respectively used as noise inputs of 14 convolution layers, so that the convolution layers corresponding to each basic face feature and each basic face template feature can be determined.

The server screens out basic face features corresponding to preset 4-12 th convolution layers from basic face features to obtain target basic face features, screens out basic face template features corresponding to 1-3 convolution layers and 13-14 convolution layers from basic face template features to obtain target basic face template features, and starts from a bottom layer, the features of the 1-3 layers of the fused face features are target basic face template features of 1-3 layers, the features of the 4-12 layers of the fused face features are target basic face features of 4-12 layers, and the features of the 13-14 layers of the fused face features are target basic face template features of 13-14 layers, so that the fused face features are obtained. And inputting the fused face features to a stylegan generator, so as to obtain a fused face image.

209. And the server performs face region segmentation on the face template image and the fused face image to obtain a face region.

For example, the server performs multidimensional feature extraction on the fused face image to obtain multi-dimensional face region features, and performs multidimensional feature extraction on the face template image to obtain multi-dimensional template face region features. And carrying out global pooling on the face region features to obtain first pooled face region features, and carrying out full-connection layer and activation function processing on the first pooled face region features by adopting an attention module so as to obtain first attention weights of each feature channel of each face region feature. The face region features are weighted based on the first attention weight to obtain first weighted face region features, second attention weight of the first weighted face region features is determined, the first weighted face region features can be weighted based on the second attention weight to obtain second weighted face region features, the first weighted face region features and the second weighted face region features are fused to obtain first fused face region features, the first weighted face region features and the face region features are fused to obtain second fused face region features, the first fused face region features and the second fused face region features are decoded to obtain first decoded face region features and second decoded face region features, the first decoded face region features and the second decoded face region features are fused to obtain third fused face region features, the third fused face region features are decoded to obtain third decoded face region features, and the face mask is used as a current mask region of the face mask image according to the third decoded face region.

The server overlaps the current face mask and the fused face image, so that the region corresponding to the current face mask can be identified, the region corresponding to the current face mask is used as the fused face region, or the position information of the current face mask is acquired, and the region corresponding to the position information is identified in the fused face image according to the position information, so that the fused face region is obtained.

The process of dividing the template face region in the face template image by the server according to the template face region features is the same as the process of dividing the fused face region in the fused face image, and detailed description is omitted here. And taking the fused face area and the template face area as the face area.

210. And the server performs face fusion on the fused face image and the face template image based on the face region to obtain a target face image.

For example, the server converts the three-channel fused face region and the three-channel template face region from an RGB color space to an LAB color space, and uses the LAB color space as a target color space, thereby obtaining a target fused face region and a target template face region in the LAB color space. The method comprises the steps of obtaining a current pixel value of each color channel of a target fusion face region in an LAB color space, calculating the current pixel value, a mean value and a variance, obtaining a first pixel mean value and a first pixel variance of each color channel, and taking the first pixel mean value and the first pixel variance as first color parameters. The manner of calculating the second color parameter is the same as that of calculating the first color parameter, and will not be described in detail here.

The server calculates the variance ratio of the first pixel variance and the second pixel variance under each color channel, calculates the pixel difference value between the current pixel value and the first pixel mean value under each color channel, and fuses the pixel difference value, the variance ratio and the second pixel mean value to obtain the target pixel mean value of each color channel, wherein the calculation process of the target pixel mean value of each color channel can be shown as a formula (4). After the target pixel value of each color channel is calculated, the current pixel value of each color channel of the target fusion face region can be replaced by the corresponding target pixel value, so that the corrected fusion face image is obtained.

The server can adopt a key point detection network of the trained image processing model to perform feature extraction on the face template image to obtain light-weight face key point features with multiple dimensions, the light-weight face key point features are fused to obtain face key point features, and the face key points are identified in the face template image according to the face key point features to obtain first face key points. In addition, the method for detecting the key points of the face in the corrected fused face image is the same as the process for detecting the key points in the face template image, and detailed description is omitted here.

The server compares the first face key point with the second face key point to obtain key point deformation information, adjusts the light-weight face key point characteristics based on the key point deformation information, determines interpolation information for deforming the first face key point to the second face key point according to the adjusted key point face characteristics, and deforms abnormal key points based on the difference information in a thin plate spline difference mode to obtain a deformed face template image. And carrying out mixed gradient fusion (poisson fusion) on the fused face image and the deformed face template image, thereby obtaining a target face image.

As can be seen from the foregoing, after obtaining at least one face image sample pair, the server in this embodiment performs feature extraction on a face image sample by using a preset image processing model to obtain sample face features, performs face replacement on a face template image sample according to the sample face features to obtain a replaced face image, then reconstructs the face image sample based on the sample face features to obtain a reconstructed face image, and then converges the preset image processing model according to the face image sample, the replaced face image and the reconstructed face image to obtain a trained image processing model, and performs face replacement on the face image to be processed by using the trained image processing model; according to the scheme, after the feature extraction is carried out on the face image sample, the face image sample can be reconstructed, so that a triplet data set of the face image sample, the face template image sample and the reconstructed face image is formed, and the preset image processing model is converged based on the triplet data, so that the stability and the processing precision of image processing training are greatly improved, and the accuracy of image processing can be improved.

In order to better implement the above method, the embodiment of the present invention further provides an image processing apparatus, where the image processing apparatus may be integrated into an electronic device, such as a server or a terminal, where the terminal may include a tablet computer, a notebook computer, and/or a personal computer.

For example, as shown in fig. 11, the image processing apparatus may include an acquisition unit 301, an extraction unit 302, a first replacement unit 303, a reconstruction unit 304, and a second replacement unit 305, as follows:

(1) An acquisition unit 301;

an acquisition unit 301 for acquiring at least one pair of face image samples including a face image sample and a face template image sample.

For example, the obtaining unit 301 may be specifically configured to receive a pair of face image samples sent by a terminal, where the pair of face image samples includes a face image sample and a face template image sample, or may obtain a face image sample from a network or an image database, send the face image sample to a template image server, and receive a face template image sample corresponding to each face image sample returned by the analog image server, and form a pair of face image samples from the face image sample and the face template image sample, or may obtain an original face image from the network or the image database, and screen out a face template image sample corresponding to the face image sample and the face image sample from the original face image sample, thereby obtaining a pair of face image samples, where the screening condition may be that the faces of the object in the face image sample and the face template image are different, or may further receive a face image processing request when the number of the pair of face image samples is greater or the memory is greater, where the storage address of the pair of face image samples is carried, and obtain the pair of face image samples according to the storage address.

(2) An extraction unit 302;

the extracting unit 302 is configured to perform feature extraction on the facial image sample by using a preset image processing model, so as to obtain a sample facial feature.

For example, the extracting unit 302 may be specifically configured to perform convolution feature extraction on the face image sample by using a feature extraction network of a preset image processing model to obtain an initial sample facial feature, and then process the initial sample facial feature by using a full-connection layer to obtain a 256-dimensional feature vector, and use the feature vector as the sample facial feature.

(3) A first replacement unit 303;

the first replacing unit 303 is configured to perform face replacement on the face image sample according to the sample facial features and the face template image sample, so as to obtain a replaced face image.

For example, the first replacing unit 303 may be specifically configured to perform feature extraction on a face template image by using a preset image processing model to obtain a face style feature, perform multi-size fusion on the face style feature and the sample facial feature to obtain a first fused sample facial feature under each size, and perform face replacement on the face template image sample based on the first fused sample facial feature to obtain a replaced face image.

(4) A reconstruction unit 304;

a reconstruction unit 304, configured to reconstruct a facial image sample based on the facial features of the sample, and obtain a reconstructed facial image.

For example, the reconstruction unit 304 may be specifically configured to perform feature extraction on a facial image sample by using a preset image processing model to obtain a sample facial style feature, perform multi-size fusion on the sample facial style and the sample facial feature to obtain a second fused sample under each size, and perform facial replacement on the facial image sample based on the second fused sample facial feature to obtain a reconstructed facial image sample.

(5) A second replacement unit 305;

the second replacing unit 305 is configured to converge the preset image processing model according to the face image sample, the replaced face image, and the reconstructed face image, obtain a trained image processing model, and replace the face of the to-be-processed face image with the trained image processing model.

For example, the second replacing unit 305 may specifically be configured to determine perceived loss information of a pair of face image samples according to the face image sample, the replaced face image and the reconstructed face head image, determine counterloss information of the pair of face image samples based on the replaced face image and the face image sample, fuse the perceived loss information and the counterloss information, and converge the preset image processing model based on the fused loss information, to obtain the trained image processing model. And adopting the trained image processing model to perform face replacement on the to-be-processed face image.

Optionally, the second replacing unit 305 may further include an acquiring subunit 3051, an extracting subunit 3052, a constructing subunit 3053, a dividing subunit 3054, and a fusing subunit 3055, as shown in fig. 12, specifically may be as follows:

(1) An acquisition subunit 3051;

the acquiring subunit 3051 is configured to acquire a face image to be processed and a face template image corresponding to the face image to be processed.

For example, the acquiring subunit 3051 may specifically be configured to directly acquire a to-be-processed face image and a face template image sent by the terminal, or may also acquire a to-be-processed face image, screen a face template image corresponding to the to-be-processed face image from a face analog image database, or acquire an original face image from a network or an image database, screen a face image pair from the original face image, and use any one of the face images as the to-be-processed face image, and then use the other of the face images as the face template image, or, when the number of to-be-processed face images and the face template image is greater or the memory is greater, receive a face image processing request, where the face image processing request carries a storage address of the to-be-processed face image and the face template image corresponding to the to-be-processed face image, and acquire the face template image corresponding to the to-be-processed face image according to the storage address.

(2) An extraction subunit 3052;

and the extraction subunit 3052 is configured to perform feature extraction on the face image to be processed and the face template image by using the trained image processing model, so as to obtain the face feature of the face image to be processed and the face template feature of the face template image.

For example, the extracting subunit 3052 may specifically be configured to perform facial style feature extraction on a to-be-processed facial image by using a facial style feature extraction network of the trained image processing model to obtain facial features, and perform facial style feature extraction on a facial template image by using a facial style feature extraction network to obtain facial template features.

(3) Constructing subunit 3053;

the constructing subunit 3053 is configured to fuse the facial feature and the facial template feature, and construct a fused facial image based on the fused facial feature.

For example, the construction subunit 3053 may specifically be configured to convert facial features into a preset number of basic facial features, determine a convolution layer corresponding to each basic facial feature, convert facial template features into a preset number of basic facial template features, determine a convolution layer corresponding to each basic facial template feature, and fuse the basic facial features and the basic facial template features according to the convolution layers to obtain a fused facial feature. The fused facial features are input to a stylegan generator, resulting in a fused facial image.

(4) A split subunit 3054;

the segmentation subunit 3054 is configured to segment the face region of the face template image and the fused face image to obtain a face region.

For example, the segmentation subunit 3054 may specifically be configured to perform multidimensional feature extraction on the fused face image to obtain multidimensional face region features, perform multidimensional feature extraction on the face template image to obtain multidimensional template face region features, segment the fused face region in the fused face image according to the face region features, segment the template face region in the face template image according to the template face region features, and use the fused face region and the template face region as the face regions.

(5) A fusion subunit 3055;

and a fusion subunit 3055, configured to perform face fusion on the fused face image and the face template image based on the face area, so as to obtain a target face image.

For example, the fusion subunit 3055 may specifically be configured to perform color correction on the fused face image according to the fused face area and the template face area to obtain a corrected fused face image, detect a face key point in the face template image to obtain a first face key point, detect a face key point in the corrected fused face image to obtain a second face key point, and perform face fusion on the fused face image and the face template image according to the first face key point and the second face key point to obtain the target face image.

In the implementation, each unit may be implemented as an independent entity, or may be implemented as the same entity or several entities in any combination, and the implementation of each unit may be referred to the foregoing method embodiment, which is not described herein again.

As can be seen from the foregoing, in the embodiment of the present application, after the obtaining unit 301 obtains at least one facial image sample pair, the extracting unit 302 performs feature extraction on the facial image sample by using the preset image processing model to obtain a sample facial feature, the first replacing unit 303 performs face replacement on the facial template image sample according to the sample facial feature to obtain a replaced facial image, then the reconstructing unit 304 reconstructs the facial image sample based on the sample facial feature to obtain a reconstructed facial image, and then the second replacing unit 305 converges the preset image processing model according to the facial image sample, the replaced facial image and the reconstructed facial image to obtain a trained image processing model, and performs face replacement on the facial image to be processed by using the trained image processing model; according to the scheme, after the feature extraction is carried out on the facial image sample, the facial image sample can be reconstructed, so that a triplet data set of the facial image sample, the facial template image sample and the reconstructed facial image is formed, and the preset image processing model is converged based on the triplet data, so that the stability and the processing precision of image processing training are greatly improved, and the accuracy of image processing can be improved.

The embodiment of the invention also provides an electronic device, as shown in fig. 13, which shows a schematic structural diagram of the electronic device according to the embodiment of the invention, specifically:

the electronic device may include one or more processing cores 'processors 401, one or more computer-readable storage media's memory 402, power supply 403, and input unit 404, among other components. It will be appreciated by those skilled in the art that the electronic device structure shown in fig. 13 is not limiting of the electronic device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. Wherein:

the processor 401 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by running or executing software programs and/or modules stored in the memory 402, and calling data stored in the memory 402. Optionally, processor 401 may include one or more processing cores; preferably, the processor 401 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user interface, an application program, etc., and the modem processor mainly processes wireless communication. It will be appreciated that the modem processor described above may not be integrated into the processor 401.

The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by executing the software programs and modules stored in the memory 402. The memory 402 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the electronic device, etc. In addition, memory 402 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 with access to the memory 402.

The electronic device further comprises a power supply 403 for supplying power to the various components, preferably the power supply 403 may be logically connected to the processor 401 by a power management system, so that functions of managing charging, discharging, and power consumption are performed by the power management system. The power supply 403 may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

The electronic device may further comprise an input unit 404, which input unit 404 may be used for receiving input digital or character information and generating keyboard, mouse, joystick, optical or trackball signal inputs in connection with user settings and function control.

Although not shown, the electronic device may further include a display unit or the like, which is not described herein. In particular, in this embodiment, the processor 401 in the electronic device loads executable files corresponding to the processes of one or more application programs into the memory 402 according to the following instructions, and the processor 401 executes the application programs stored in the memory 402, so as to implement various functions as follows:

obtaining at least one face image sample pair, wherein the face image sample pair comprises a face image sample and a face template image sample, performing feature extraction on the face image sample by adopting a preset image processing model to obtain sample face features, performing face replacement on the face image sample according to the sample face features and the face template image sample to obtain a replaced face image, reconstructing the face image sample based on the sample face features to obtain a reconstructed face image, converging the preset image processing model according to the face image sample, the replaced face image and the reconstructed face image to obtain a trained image processing model, and performing face replacement on the face image to be processed by adopting the trained image processing model.

For example, the electronic device obtains a pair of facial image samples including a facial image sample and a facial template image sample. And carrying out convolution feature extraction on the facial image sample by adopting a feature extraction network of a preset image processing model to obtain initial sample facial features, and then adopting a full-connection layer to process the initial sample facial features to obtain a 256-dimensional feature vector, wherein the feature vector is used as the sample facial features. And carrying out feature extraction on the facial template image by adopting a preset image processing model to obtain facial style features, carrying out multi-size fusion on the facial style features and the sample facial features to obtain first fused sample facial features under each size, and carrying out face replacement on the facial template image sample based on the first fused sample facial features to obtain a replaced facial image. And carrying out feature extraction on the facial image sample by adopting a preset image processing model to obtain sample facial style features, carrying out multi-size fusion on the sample facial style and the sample facial features to obtain a second fused sample under each size, and carrying out face replacement on the facial image sample based on the second fused sample facial features to obtain a reconstructed facial image sample. And determining the perceived loss information of the face image sample pair according to the face image sample, the replaced face image and the reconstructed face head image, determining the counterloss information of the face image sample pair based on the replaced face image and the face image sample, fusing the perceived loss information and the counterloss information, and converging a preset image processing model based on the fused loss information to obtain the trained image processing model.

The electronic equipment acquires a face image to be processed and a face template image corresponding to the face image to be processed. And carrying out facial style feature extraction on the facial image to be processed by adopting a facial style feature extraction network of the trained image processing model to obtain facial features, and carrying out facial style feature extraction on the facial template image by adopting a facial style feature extraction network to obtain facial template features. Converting the facial features into a preset number of basic facial features, determining convolution layers corresponding to each basic facial feature, converting the facial template features into a preset number of basic facial template features, determining the convolution layers corresponding to each basic facial template feature, and fusing the basic facial features and the basic facial template features according to the convolution layers to obtain fused facial features. The fused facial features are input to a stylegan generator, resulting in a fused facial image. The method comprises the steps of performing multidimensional feature extraction on a fused face image to obtain facial region features with multiple dimensions, performing multidimensional feature extraction on a face template image to obtain template facial region features with multiple dimensions, dividing the fused face region in the fused face image according to the facial region features, dividing the template facial region in the face template image according to the template facial region features, and taking the fused face region and the template facial region as facial regions. And carrying out color correction on the fused face image according to the fused face area and the template face area to obtain a corrected fused face image, detecting a face key point in the face template image to obtain a first face key point, detecting the face key point in the corrected fused face image to obtain a second face key point, and carrying out face fusion on the fused face image and the face template image according to the first face key point and the second face key point to obtain a target face image.

The specific implementation of each operation may be referred to the previous embodiments, and will not be described herein.

As can be seen from the above, in the embodiment of the present invention, after obtaining at least one facial image sample pair, performing feature extraction on a facial image sample by using a preset image processing model to obtain a sample facial feature, performing facial replacement on a facial template image sample according to the sample facial feature to obtain a replaced facial image, reconstructing the facial image sample based on the sample facial feature to obtain a reconstructed facial image, and then converging the preset image processing model according to the facial image sample, the replaced facial image and the reconstructed facial image to obtain a trained image processing model, and performing facial replacement on the facial image to be processed by using the trained image processing model; according to the scheme, after the feature extraction is carried out on the facial image sample, the facial image sample can be reconstructed, so that a triplet data set of the facial image sample, the facial template image sample and the reconstructed facial image is formed, and the preset image processing model is converged based on the triplet data, so that the stability and the processing precision of image processing training are greatly improved, and the accuracy of image processing can be improved.

Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.

To this end, an embodiment of the present invention provides a computer readable storage medium having stored therein a plurality of instructions capable of being loaded by a processor to perform the steps of any one of the image processing methods provided by the embodiments of the present invention. For example, the instructions may perform the steps of:

For example, a pair of face image samples including a face image sample and a face template image sample is acquired. And carrying out convolution feature extraction on the facial image sample by adopting a feature extraction network of a preset image processing model to obtain initial sample facial features, and then adopting a full-connection layer to process the initial sample facial features to obtain a 256-dimensional feature vector, wherein the feature vector is used as the sample facial features. And carrying out feature extraction on the facial template image by adopting a preset image processing model to obtain facial style features, carrying out multi-size fusion on the facial style features and the sample facial features to obtain first fused sample facial features under each size, and carrying out face replacement on the facial template image sample based on the first fused sample facial features to obtain a replaced facial image. And carrying out feature extraction on the facial image sample by adopting a preset image processing model to obtain sample facial style features, carrying out multi-size fusion on the sample facial style and the sample facial features to obtain a second fused sample under each size, and carrying out face replacement on the facial image sample based on the second fused sample facial features to obtain a reconstructed facial image sample. And determining the perceived loss information of the face image sample pair according to the face image sample, the replaced face image and the reconstructed face head image, determining the counterloss information of the face image sample pair based on the replaced face image and the face image sample, fusing the perceived loss information and the counterloss information, and converging a preset image processing model based on the fused loss information to obtain the trained image processing model.

And acquiring the face image to be processed and a face template image corresponding to the face image to be processed. And carrying out facial style feature extraction on the facial image to be processed by adopting a facial style feature extraction network of the trained image processing model to obtain facial features, and carrying out facial style feature extraction on the facial template image by adopting a facial style feature extraction network to obtain facial template features. Converting the facial features into a preset number of basic facial features, determining convolution layers corresponding to each basic facial feature, converting the facial template features into a preset number of basic facial template features, determining the convolution layers corresponding to each basic facial template feature, and fusing the basic facial features and the basic facial template features according to the convolution layers to obtain fused facial features. The fused facial features are input to a stylegan generator, resulting in a fused facial image. The method comprises the steps of performing multidimensional feature extraction on a fused face image to obtain facial region features with multiple dimensions, performing multidimensional feature extraction on a face template image to obtain template facial region features with multiple dimensions, dividing the fused face region in the fused face image according to the facial region features, dividing the template facial region in the face template image according to the template facial region features, and taking the fused face region and the template facial region as facial regions. And carrying out color correction on the fused face image according to the fused face area and the template face area to obtain a corrected fused face image, detecting a face key point in the face template image to obtain a first face key point, detecting the face key point in the corrected fused face image to obtain a second face key point, and carrying out face fusion on the fused face image and the face template image according to the first face key point and the second face key point to obtain a target face image.

The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.

Wherein the computer-readable storage medium may comprise: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.

Because the instructions stored in the computer readable storage medium may execute the steps in any one of the image processing methods provided in the embodiments of the present application, the beneficial effects that any one of the image processing methods provided in the embodiments of the present application can achieve are detailed in the previous embodiments, and are not described herein.

Wherein according to an aspect of the application, a computer program product or a computer program is provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read from the computer-readable storage medium by a processor of a computer device, which executes the computer instructions, causing the computer device to perform the methods provided in the above-described facial image processing aspects or various alternative implementations of facial image processing aspects.

The foregoing has described in detail the methods, apparatuses, electronic devices and computer readable storage medium for image processing according to the embodiments of the present invention, and specific examples have been applied to illustrate the principles and embodiments of the present invention, where the foregoing examples are provided to assist in understanding the methods and core ideas of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present invention, the present description should not be construed as limiting the present invention.

Claims

1. An image processing method, comprising:

2. The image processing method according to claim 1, wherein said performing face replacement on the face template image sample based on the sample face features to obtain a replaced face image comprises:

extracting features of the facial template image sample by adopting the preset image processing model to obtain facial style features;

performing multi-size fusion on the facial style characteristics and the sample facial characteristics to obtain first fused sample facial characteristics under each size;

and carrying out face replacement on the face template image sample based on the face characteristics of the first fused sample to obtain a replaced face image.

3. The image processing method according to claim 2, wherein the performing multi-size fusion of the facial style features and the sample facial features to obtain a first fused sample facial feature at each size includes:

Extracting basic facial style characteristics under each size from the facial style characteristics, and extracting basic sample facial characteristics under each size from the sample facial characteristics;

calculating style feature parameters of the basic facial style features and facial feature parameters of the basic sample facial features;

and fusing the basic facial style characteristics and the basic sample facial characteristics based on the style characteristic parameters and the facial characteristic parameters to obtain first fused sample facial characteristics under each size.

4. The image processing method according to claim 3, wherein the fusing the basic facial style feature and the basic sample facial feature based on the style feature parameter and the facial feature parameter to obtain the first fused sample facial feature at each size includes:

based on the style feature parameters and the face feature parameters, fusing the basic face style features and the basic sample face features under the same size to obtain initial fused sample face features under each size;

determining a face area mask corresponding to the face image sample under each size according to the initial fused sample face features, wherein the face area mask is used for indicating attention weights among the initial fused face features;

And fusing the facial region mask with the initial fused sample facial features to obtain first fused sample facial features in each size.

5. The image processing method according to claim 2, wherein the performing face replacement on the face template image sample based on the first fused face feature to obtain a replaced face image includes:

generating a basic facial image with a size corresponding to each first fused facial feature according to the first fused sample facial features;

sorting the basic face images according to the sizes of the basic face images to obtain sorting information;

and adjusting the size of the basic face image according to the ordering information to obtain a replaced face image.

6. The image processing method according to claim 5, wherein the adjusting the size of the basic face image according to the ranking information to obtain the replaced face image includes:

screening out a basic face image with the smallest size from the basic face images to obtain a current basic face image;

amplifying the size of the current basic face image to obtain an amplified basic face image;

Screening a basic face image next to the current basic face image from the basic face images according to the ordering information to obtain a target basic face image;

and fusing the amplified basic face image and the target basic face image to obtain the replaced face image.

7. The image processing method according to claim 6, wherein the fusing the enlarged base face image and the target base face image to obtain the replaced face image includes:

fusing the amplified basic face image and the target basic face image to obtain a fused basic face image;

amplifying the size of the fused basic face image to obtain a target amplified basic face image;

taking the target basic face image as the current basic face image, and taking the target amplified basic face image as the amplified basic face image;

and returning to the step of screening the next basic face image of the current basic face image from the basic face images according to the sorting information until the last basic face image is screened, and fusing the screened target basic face image with the amplified basic face image to obtain a replaced basic face image.

8. The image processing method according to any one of claims 1 to 7, wherein the converging the preset image processing model according to the face image sample, the replaced face image, and the reconstructed face image to obtain a trained image processing model includes:

determining reconstructed perception loss information of the face image sample pair according to the face image sample and the reconstructed face image;

determining label perception loss information of the face image sample pair according to the face image sample and the replaced face image;

fusing the reconstructed perception loss information and the label perception loss information to obtain perception loss information of the facial image sample pair;

determining fight loss information for the pair of face image samples based on the replaced face image and face image samples;

and fusing the perception loss information and the counterloss information, and converging the preset image processing model based on the fused loss information to obtain a trained image processing model.

9. The image processing method according to claim 1, wherein said performing face replacement on the face image to be processed using the trained image processing model comprises:

Acquiring a face image to be processed and a face template image corresponding to the face image to be processed;

respectively extracting features of a face image to be processed and a face template image by adopting the trained image processing model to obtain the face features of the face image to be processed and the face template features of the face template image;

fusing the facial features and the facial template features, and constructing a fused facial image based on the fused facial features;

performing face region segmentation on the face template image and the fused face image to obtain a face region;

and carrying out face fusion on the fused face image and the face template image based on the face area to obtain a target face image.

10. The image processing method according to claim 9, wherein the fusing the facial features and facial template features includes:

converting the facial features into a preset number of basic facial features, and determining a convolution layer corresponding to each basic facial feature;

converting the facial template features into the preset number of basic facial template features, and determining a convolution layer corresponding to each basic facial template feature;

And according to the convolution layer, fusing the basic facial features and the basic facial template features to obtain fused facial features.

11. The image processing method according to claim 10, wherein the fusing the basic facial features and basic facial template features according to the convolution layer to obtain fused facial features includes:

screening basic facial features corresponding to a preset first convolution layer from the basic facial features to obtain target basic facial features;

screening basic facial template characteristics corresponding to a preset second convolution layer from the basic facial template characteristics to obtain target basic facial template characteristics;

and adopting the convolution layer to fuse the target basic facial features and the target basic facial template features to obtain fused facial features.

12. The image processing method according to any one of claims 9 to 11, wherein the performing face region segmentation on the face template image and the fused face image to obtain a face region includes:

extracting multidimensional features from the fused facial images to obtain facial region features with multiple dimensions;

Carrying out multidimensional feature extraction on the face template image to obtain template face region features with multiple dimensions;

segmenting a fused face area in the fused face image according to the facial area characteristics;

and dividing a template face area in the face template image according to the template face area characteristics, and taking the fused face area and the template face area as the face area.

13. The image processing method according to claim 9, wherein the performing face fusion on the fused face image and the face template image based on the face region to obtain a target face image includes:

performing color correction on the fused face image according to the fused face area and the template face area to obtain a corrected fused face image;

detecting a face key point in the face template image to obtain a first face key point, and detecting a face key point in the corrected fused face image to obtain a second face key point;

and carrying out face fusion on the fused face image and the face template image according to the first face key point and the second face key point to obtain a target face image.

14. The method of claim 13, wherein detecting the facial keypoints in the facial template image results in a first facial keypoint, comprising:

extracting features of the face template image to obtain light-weight face key point features with multiple dimensions;

fusing the light-weight facial key point features to obtain facial key point features;

and identifying the facial key points in the facial template image according to the facial key point characteristics to obtain first facial key points.

15. The image processing method according to claim 14, wherein the performing face fusion on the fused face image and the face template image according to the first face key point and the second face key point to obtain a target face image includes:

comparing the first surface key points with the second surface key points to obtain key point deformation information;

adjusting the lightweight facial key point features based on the key point deformation information;

determining interpolation information for deforming the first facial key point to a second facial key point according to the adjusted key point facial features;

Based on the interpolation information, deforming the abnormal key points with different first face key points and second face key points to obtain a deformed face template image;

and fusing the fused face image and the deformed face template image to obtain a target face image.

16. An image processing apparatus, comprising:

17. The image processing apparatus according to claim 16, wherein the second replacing unit includes:

18. An electronic device comprising a processor and a memory, the memory storing an application, the processor being configured to run the application in the memory to perform the steps in the image processing method of any of claims 1 to 15.

19. A computer program product comprising computer programs/instructions which when executed by a processor implement the steps of the image processing method of any of claims 1 to 15.

20. A computer readable storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps in the image processing method of any one of claims 1 to 15.