WO2023040679A1 - 人脸图片的融合方法、装置、设备及存储介质 - Google Patents

人脸图片的融合方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2023040679A1
WO2023040679A1 PCT/CN2022/116786 CN2022116786W WO2023040679A1 WO 2023040679 A1 WO2023040679 A1 WO 2023040679A1 CN 2022116786 W CN2022116786 W CN 2022116786W WO 2023040679 A1 WO2023040679 A1 WO 2023040679A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
hidden code
face picture
face
identity
Prior art date
Application number
PCT/CN2022/116786
Other languages
English (en)
French (fr)
Inventor
陶洪
李玉乐
项伟
Original Assignee
百果园技术(新加坡)有限公司
陶洪
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百果园技术(新加坡)有限公司, 陶洪 filed Critical 百果园技术(新加坡)有限公司
Publication of WO2023040679A1 publication Critical patent/WO2023040679A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Definitions

  • the present application relates to the technical field of machine learning, in particular to a fusion method, device, equipment and storage medium of human face pictures.
  • Face fusion refers to the process of fusing two face pictures into one face picture.
  • the face obtained through the face fusion process has the characteristics of the faces in the two pictures at the same time.
  • face fusion technology is widely used in various photo retouching, video editing and other fields.
  • a triangulation method is used to divide the source face picture and the target face picture to obtain a fusion picture.
  • the points on the contour of the facial features are used as feature points, and the points on the edge of the picture and the contour line of the face are selected as the anchor points; the anchor points are respectively connected with the feature points, and several triangulation partitions are obtained according to the triangulation algorithm; for the source face
  • find the corresponding triangulation partition on the target face image perform mapping transformation on the above two triangulation partitions, obtain the fusion triangulation partition, and determine the fusion based on the pixel values of the above two triangulation partitions Pixel values for the triangulation partitions; generate a fused face image based on all fused triangulation partitions.
  • Embodiments of the present application provide a fusion method, device, device, and storage medium of human face pictures.
  • the technical solution is as follows:
  • a fusion method of a human face picture is provided, the method is executed by a computer device, and the method includes:
  • Fusion is performed based on the identity feature hidden code and the attribute feature hidden code to generate a fused face picture.
  • a training method of a face fusion model is provided, the method is executed by a computer device, the face fusion model includes a generation network and a discrimination network, and the generation network includes an identity coding network , an attribute encoding network and a decoding network; the method includes:
  • training sample of human face fusion model comprises source human face picture sample and target human face picture sample;
  • the identity feature hidden code of the source face picture sample through the identity coding network, and the identity feature hidden code is used to characterize the identity feature of the person in the source face picture sample;
  • Fusion is carried out based on the identity feature hidden code and the attribute feature hidden code through the decoding network to generate a fusion face image sample;
  • the sample to be discriminated includes the fusion face image sample
  • a fusion device of a human face picture includes:
  • the human face image acquisition module is configured to acquire the source human face image and the target human face image
  • the identity feature acquisition module is configured to obtain the identity feature hidden code of the source face picture, and the identity feature hidden code is used to characterize the identity feature of the person in the source face picture;
  • the attribute feature acquisition module is configured to acquire the attribute feature hidden code of the target face picture, and the attribute feature hidden code is used to characterize the character attribute feature in the target face picture;
  • a fused picture generating module configured to fuse based on the identity feature hidden code and the attribute feature hidden code to generate a fused face picture.
  • a training device for a face fusion model includes a generation network and a discrimination network, and the generation network includes an identity encoding network, an attribute encoding network and a decoding network;
  • the devices include:
  • Training sample obtaining module configured to obtain the training sample of human face fusion model, described training sample comprises source human face picture sample and target human face picture sample;
  • the identity feature acquisition module is configured to obtain the identity feature hidden code of the source face picture sample through the identity coding network, and the identity feature hidden code is used to characterize the identity feature of the person in the source face picture sample;
  • the attribute feature acquisition module is configured to obtain the attribute feature hidden code of the target face picture sample through the attribute encoding network, and the attribute feature hidden code is used to characterize the character attribute feature in the target face picture sample;
  • a fusion picture generation module configured to fuse based on the identity feature hidden code and the attribute feature hidden code through the decoding network to generate a fusion face picture sample
  • the human face picture discrimination module is configured to determine whether the samples to be discriminated are generated by the generation network through the discrimination network, and the samples to be discriminated include the fusion human face picture samples;
  • a first parameter adjustment module configured to determine a discriminant network loss based on the discriminant network's discriminative result, and adjust parameters in the discriminant network based on the discriminative network loss;
  • the second parameter adjustment module is configured to determine and generate a network loss based on the fusion face image sample, the source face image sample, the target face image sample, and the discrimination result of the discriminant network, and based on the generated Network loss adjusts the parameters in the generating network.
  • a computer device the computer device includes a processor and a memory, a computer program is stored in the memory, and the processor executes the computer program to realize the above-mentioned human face picture fusion method, or realize the training method of the above-mentioned human face fusion model.
  • a computer-readable storage medium is provided, and a computer program is stored in the storage medium, and the computer program is used to be executed by a processor to implement the above-mentioned fusion method of human face pictures, Or realize the training method of the above-mentioned face fusion model.
  • a computer program product is provided.
  • the computer device is made to execute the above-mentioned fusion method of human face pictures, or the above-mentioned training method of human face fusion model.
  • the method of generating a highly realistic fused face image can generate a clear and realistic fused face image even when the face angle, skin color and other characteristics are too different between the source face image and the target face image .
  • FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application.
  • Fig. 2 is the flow chart of the fusion method of the face picture that one embodiment of the present application provides;
  • Fig. 3 is a schematic diagram of a fusion method for face pictures provided by another embodiment of the present application.
  • Fig. 4 is the flowchart of the training method of the human face fusion model provided by one embodiment of the present application.
  • FIG. 5 is a schematic diagram of a training method for a face fusion model provided by an embodiment of the present application.
  • Fig. 6 is a block diagram of a fusion device of a face picture provided by an embodiment of the present application.
  • FIG. 7 is a block diagram of a training device for a face fusion model provided in another embodiment of the present application.
  • Fig. 8 is a schematic diagram of a computer device provided by an embodiment of the present application.
  • Computer Vision refers to the automatic extraction, analysis and understanding of useful information by a computer from an image or a series of pictures.
  • the fields covered by computer vision technology include scene reconstruction, event detection, video tracking, target recognition, 3D pose estimation, motion estimation, image restoration and other technologies, as well as common biometric recognition technologies such as face recognition and fingerprint recognition, as well as face fusion. and other technologies.
  • Generative Adversarial Networks consists of a generative neural network and a discriminative neural network.
  • the generative neural network is used to process input data to generate generated data
  • the discriminative neural network is used to distinguish real data from generated data.
  • the generative neural network and the discriminative neural network confront each other, and the generative neural network adjusts its own network parameters according to the loss function of the generative network, so that the generated data can mislead the judgment result of the discriminative neural network.
  • the discriminative neural network adjusts its own network parameters according to the discriminative network loss function, so that the discriminative neural network can correctly distinguish between real data and generated data.
  • the generated data generated by the generative neural network is close to the real data. The discriminator cannot tell the difference between generated data and real data.
  • Affine Transformation refers to a linear transformation of a vector space and a translation to obtain a new vector space in geometry.
  • V a2 *x+ b2 *y+ c2
  • Operations such as translation, scaling, and rotation of two-dimensional images can be realized through affine transformation.
  • the straight line and parallelism of the two-dimensional image can be maintained.
  • the straightness means that the straight line after the affine transformation is still a straight line, and the arc is still a circular arc after the affine transformation.
  • Parallelism means that The relative positional relationship between the straight lines remains unchanged after the affine transformation, and the relative positions of the points on the straight lines do not change after the affine transformation.
  • AdaIN Adaptive Instance Normalization
  • the AdaIN operation takes as input a content x and a style feature y, and matches the channel mean and variance of x to the mean and variance of y according to the following format.
  • AdaIN(x,y) ⁇ (y)(x- ⁇ (x)/ ⁇ (x))+ ⁇ (y)
  • AdaIN achieves style transfer in feature space by transferring feature statistics, namely mean and variance in the channel direction.
  • FIG. 1 shows a schematic diagram of a solution implementation environment provided by an embodiment of the present application.
  • the implementation environment of this scheme can realize what is called a face fusion system.
  • the system framework of this solution may include a server 10 and at least one terminal device 20 .
  • the terminal device 20 may be an electronic device such as a mobile phone, a tablet computer, a PC (Personal Computer, personal computer), a smart TV, or a multimedia playback device.
  • the face fusion model is carried on the target application program, and there is a target application program running on the terminal device 20.
  • the target application program can be a camera application program, a video application program and a social application program, etc., and the type of the target application program is not limited here .
  • the target application program is deployed on the terminal device 20, and the fusion process of the face picture can be carried out on the terminal device.
  • the terminal device obtains the source face picture and the target face picture, and extracts identity features from the source face picture. Hidden code, extract attribute feature hidden code for the target face picture, and fuse identity feature hidden code and attribute feature hidden code to generate fused face picture and complete the fusion process of face picture.
  • the server 10 is a background server that can run target applications.
  • the server 10 can be one server, or a server cluster composed of multiple servers, or a cloud computing service center.
  • the fusion process of human face pictures can also be carried out on the server 10, and the terminal device 20 uploads the obtained source human face pictures and target human face pictures to the server 10, and the server 10 extracts
  • the identity feature hidden code is to extract the attribute feature hidden code for the target face picture, and fuse the identity feature hidden code and the attribute feature hidden code to generate a fusion face picture, and send the generated fusion picture to the terminal device 20 to complete the human The fusion process of face images.
  • Communication between the terminal device 20 and the server 10 can be performed through a network.
  • FIG. 2 shows a flow chart of a method for merging human face pictures provided by an embodiment of the present application.
  • the execution subject of each step of the method may be the terminal device 20 in the implementation environment of the solution shown in FIG. 1, or is server 10.
  • the computer device is used as the execution subject below, and the method may include at least one of the following steps (210-240):
  • Step 210 acquire the source face picture and the target face picture.
  • the source face picture refers to the face picture that needs to be transformed according to a certain style.
  • the source face picture is generally a real picture provided by the user, such as a picture of a person taken by the user with a mobile phone, a camera and other tools.
  • the target face picture refers to a face picture that can provide a style change for the source face picture.
  • the target face picture can be a face picture provided by an application on the terminal device, or a face picture uploaded by a user. In the embodiment of the present application, there is no limitation on the way of acquiring the source face picture and the target face picture.
  • step 220 the identity feature hidden code of the source face picture is obtained, and the identity feature hidden code is used to represent the identity feature of the person in the source face picture.
  • the identity feature hidden code is used to represent the shape of the facial features in the source face picture, the relative position between the facial features and the shape of the face, etc. These features are related to the identity of the person. That is, usually different faces have different facial features, relative positions and facial features between the facial features. Therefore, different identity feature hidden codes can be obtained from different source face pictures.
  • the identity feature hidden code is obtained by encoding the source face picture through an identity encoding network.
  • step 230 the attribute feature hidden code of the target face picture is obtained, and the attribute feature hidden code is used to represent the attribute feature of the person in the target face picture.
  • the character attribute features in the target face picture include but are not limited to at least one of the following: features such as face makeup, face skin color, character hairstyle, accessories, and head posture in the target face picture.
  • the head pose feature of the target face picture refers to the mapping of the deflection angle of the target face in the two-dimensional picture in the three-dimensional space.
  • the target face refers to the face in the target face picture.
  • the head pose of the target face includes Pitch angle (pitch), yaw angle (yaw) and rotation angle (roll), for example, in the case of facing the camera, the pitch angle, yaw angle and rotation angle of the head pose of the target face picture are all 0° .
  • the attribute feature hidden code is obtained by encoding the target face picture through an attribute encoding network.
  • obtaining the hidden code of the identity feature of the source face picture and the hidden code of the attribute feature of the target face picture are carried out in two different encoding networks, so obtaining the hidden code of the identity feature of the source face picture and obtaining The hidden coding of the attribute features of the target face picture can be performed simultaneously or sequentially, which is not limited in this application.
  • Step 240 Fusion is performed based on the identity feature hidden code and the attribute feature hidden code to generate a fused face picture.
  • a fused face picture refers to a picture that has both the identity features of the source face picture and the attribute features of the target face picture. Closer to the target face image.
  • the face fusion model includes identity encoding network and attribute encoding network.
  • the face fusion model is fused based on identity feature hidden codes and attribute feature hidden codes to generate a fused face picture.
  • the technical solution provided by the embodiment of the present application obtains the source face picture and the target face picture; obtains the hidden code of the identity feature based on the source face picture and obtains the hidden code of the attribute feature based on the target face picture;
  • the feature hidden code and the attribute feature hidden code are fused to obtain a natural and realistic fused face picture.
  • the fused face picture is obtained by fusing the triangulation partitions corresponding to the source face picture and the target face picture. Some features in the face picture are jointly affected by the source face picture and the target face picture, resulting in the corresponding features in the fused face picture being unrealistic, making the authenticity of the face in the fused picture poor.
  • the identity feature hidden code is obtained through the source face picture
  • the attribute feature hidden code is obtained through the target face picture.
  • the identity feature hidden code is used to control the identity feature of the face generated in the fusion face picture.
  • the hidden code controls the attribute characteristics of the generated face in the fusion face image, avoiding the fact that the generated fusion face image is not real when there is a big difference between the face features in the source face image and the face features in the target face image Case.
  • FIG. 3 shows a schematic diagram of a fusion method for human face pictures provided by another embodiment of the present application.
  • the fusion face picture is generated by a face fusion model, and the face fusion model includes an identity encoding network, an attribute encoding network and a decoding network; wherein, the identity encoding network is used to obtain the identity feature hidden code of the source face picture ; The attribute encoding network is used to obtain the attribute feature hidden code of the target face picture; the decoding network is used to fuse based on the identity feature hidden code and the attribute feature hidden code to generate a fusion face picture.
  • both the identity coding network and the attribute coding network have N coding layers connected in series, and the structures and parameters of corresponding coding layers of the identity coding network and the property coding network are the same.
  • the size of the identity feature hidden code obtained through the identity encoding network is the same as that of the attribute feature hidden code obtained through the attribute encoding network.
  • the input of the nth layer is the output of the n-1th layer, and n is a positive integer less than or equal to N.
  • any encoding layer structure of the identity encoding network and the attribute encoding network adopts ResNet Block (residual neural network block), and in any encoding layer, for the intermediate hidden code input by the previous encoding layer, First, convolution is performed through a 1*1 convolution kernel and activated using LReLu (Leaky Rectified Linear unit, weak linear integration unit), and secondly, convolution is performed through a 3*3 convolution kernel and activated using LReLu , and finally, increase the pixel value, perform convolution through another 3*3 convolution kernel, and use LReLu to activate, and transmit the obtained intermediate hidden code to the next coding layer.
  • LReLu Lane Rectified Linear unit, weak linear integration unit
  • the attribute encoding network encodes the target face image, and outputs the attribute feature hidden code through the fully connected layer.
  • the identity and attribute features are realized in the encoding process. Decoupling effectively avoids feature entanglement.
  • the identity coding network includes N coding layers connected in series, and N is an integer greater than 1; obtaining the hidden code of the identity feature of the source face picture includes: passing the 1st to n1th in the identity coding network Coding layer, encoding the source face picture to obtain a shallow hidden code; wherein, the shallow hidden code is used to represent the facial appearance characteristics of the source face picture; through the n1th to n2th codes in the identity coding network Layer, encode the shallow hidden code to obtain the middle hidden code; wherein, the middle hidden code is used to represent the fine facial features of the source face picture; through the n2th to the Nth coding layer in the identity coding network, the The middle layer hidden code is encoded to obtain the deep layer hidden code; wherein, the deep layer hidden code is used to represent the face color feature and face microscopic feature of the source face picture; wherein, the identity feature hidden code includes: shallow layer hidden code, middle layer hidden code code and deep hidden code, n1 and n2 are positive integers less
  • the identity encoding network obtains multi-level source face images, and obtains identity feature hidden codes with different receptive fields.
  • the shallow latent code is the identity feature latent code obtained after low-resolution and fewer encoding layers, so the receptive field of the shallow latent code is small, and the pixel values in the shallow latent code are mapped in the source The pixel area on the face picture is small, and the features in the shallow latent code are rough, so the shallow latent code represents the facial appearance features of the source face picture, such as the facial contour, hairstyle and pose of the source face picture.
  • the middle layer hidden code increases the receptive field through multiple convolutions, and the pixel value in the middle layer hidden code is mapped to the pixel area in the source face image.
  • the characteristics of the representation are more and more detailed, so the middle-layer hidden code represents the finer facial features of the source face picture, for example, the opening and closing of eyes, the details of facial features, etc. in the source face picture.
  • the pixel value mapping in the deep hidden code reaches the maximum in the pixel area of the original face image.
  • the deep hidden code is used to represent the finer identity features in the original face image, such as The skin color and pupil color of the face in the source face image.
  • the identity feature hidden code is composed of a shallow hidden code, a middle hidden code and a deep hidden code. The size of the hidden code is 2*512, and the size of the identity feature hidden code is 16(8+6+2)*512.
  • the decoding network includes M decoding layers, and M is an integer greater than 1; fusion is performed based on the identity feature hidden code and the attribute feature hidden code to generate a fusion face picture, including: performing affine on the identity feature hidden code transformation to generate M groups of control vectors; M decoding layers are used to decode attribute feature hidden codes and M groups of control vectors to generate fusion face pictures; where the input of the first decoding layer includes attribute feature hidden codes and the first Group control vector, the input of the i+1th decoding layer includes the output of the i+1th decoding layer and the i+1th group control vector, the output of the Mth decoding layer includes a fusion face picture, i is a positive integer less than M .
  • the affine transformation is performed on the identity feature hidden code, the relative positional relationship between the features in the identity feature hidden code does not change, and the affine transformation can filter out the position where the feature appears and retain the relative relationship between the features.
  • the control vector is used to control the style of the fused face image.
  • performing affine transformation on the identity feature hidden codes to generate M groups of control vectors includes: dividing the identity feature hidden codes into M groups of identity feature vectors; performing affine transformation on the M groups of identity feature vectors respectively to generate M groups of control vectors; wherein, each group of control vectors includes at least two control vectors, and different control vectors are used to represent identity features of different dimensions.
  • M groups of control vectors are obtained by dividing two adjacent features in the identity feature hidden code into one group.
  • the size of the identity feature hidden code is 16*512
  • the two adjacent columns of identity Feature (1*512) is divided into a control vector group
  • identity features of different dimensions can represent the identity features of different categories of source face pictures.
  • identity features of different dimensions have different receptive fields, so different dimensions of Identity features represent features of different granularities.
  • the receptive fields of the identity features of different dimensions are the same.
  • the identity features of different dimensions represent different types of identity features of the source face picture, for example, a certain control vector group includes Features and features characterizing the nose shape of the source face image.
  • decoding attribute feature hidden codes and M groups of control vectors through M decoding layers and generating a fusion face picture includes, in the i-th decoding layer in the M decoding layers, receiving the i-th The output of layer 1, and the control vector group corresponding to the i-th layer, the control vector group includes the first control vector and the second control vector, and the decoding layer first adapts the input vector of the i-1th layer to the first control vector
  • the normalization operation obtains the intermediate vector, and the intermediate vector is convoluted through a convolution kernel with a size of 3*3, and the convolutional vector and the second control vector are subjected to an adaptive normalization operation, and the adaptive normalization operation The obtained vector is input to the i+1th layer to complete the decoding operation of a decoding layer.
  • the decoding network includes 8 decoding layers, and the decoding network uses attribute feature encoding as the input of the first decoding layer, repeats the decoding steps performed by the above-mentioned single decoding layer 8 times, and outputs pixels in the eighth decoding layer
  • the value is 512*512 to fuse face pictures.
  • Coding through multiple coding layers can avoid mutual entanglement between feature hidden codes, and decoding attribute feature hidden codes and control vector groups through the decoding network can control the identity features of fusion face pictures through control vectors to generate real and natural fusion Face pictures.
  • the training process of the face fusion model is introduced and explained through the embodiments.
  • the content involved in the use of the face fusion model and the content involved in the training process are corresponding to each other, and the two communicate with each other, such as on one side Where there is no detailed description, you can refer to the description on the other side.
  • Fig. 4 shows the flow chart of the training method of the human face fusion model provided by one embodiment of the present application.
  • the execution subject of each step of the method can be a server 10 or a computer.
  • the method may include at least one of the following steps (410-470):
  • Step 410 acquire training samples of the face fusion model, the training samples include source face picture samples and target face picture samples.
  • the face fusion model includes a generative network and a discriminative network, and the generative network includes an identity encoding network, an attribute encoding network, and a decoding network.
  • the face fusion model is a generative adversarial network model.
  • the input of the face fusion model includes source face picture samples and target face picture samples.
  • Each training sample includes two image samples, one as a source face image sample and the other as a target face image sample.
  • a face fusion model capable of generating real fused face pictures can be obtained through training.
  • the two picture samples in a training sample group may be different persons, and may also have different attribute characteristics.
  • the training samples come from a high-definition face data set (Flickr Faces High Quality, FFHQ), which includes different genders, face angles, expressions, and face pictures of makeup.
  • FFHQ Full Faces High Quality
  • the above-mentioned high-definition face data set is divided into Source human face picture sample group and target human face picture sample group, each training sample group selects a picture sample in the above-mentioned source human face picture sample group and target human face picture sample group respectively as the source human face of this training sample group Image samples and target face image samples.
  • Step 420 Obtain the identity feature hidden code of the source face picture sample through the identity coding network, and the identity feature hidden code is used to characterize the identity feature of the person in the source face picture sample.
  • the identity coding network can decouple the above feature information, so that the source code obtained through the identity coding network coding The feature entanglement of the hidden code of the identity feature of the face picture sample is less.
  • step 430 the attribute feature hidden code of the target face picture sample is obtained through the attribute coding network, and the attribute feature hidden code is used to represent the attribute feature of the person in the target face picture sample.
  • the attribute encoding network can decouple the above feature information, so that the attribute encoding network encoding can obtain The feature entanglement of the latent code of the target face picture sample attribute features is less.
  • step 440 fusion is performed through the decoding network based on the identity feature hidden code and the attribute feature hidden code to generate a fused face picture sample.
  • the decoding network is a pre-trained network. During the training process of the face fusion model, the decoding network does not participate in the training. The decoding network is only used to decode the identity feature hidden code and attribute feature hidden code to generate high-definition realistic face fusion. Image samples.
  • the decoding network uses the decoding network in the StyleGAN network structure to decode the identity feature hidden code and the attribute feature hidden code.
  • Step 450 determine whether the sample to be judged is generated by the generation network through the discriminant network, and the sample to be discriminated includes the fused face picture sample.
  • the discriminant network adopts a layer-by-layer growth method to judge whether the image to be discriminated is a real picture.
  • the discriminant network starts from the RGB image with a pixel value of 4*4 to gradually increase the pixel value of the picture, and expands the pixels of the image to be discriminated to 8*8, 6*16, 32*32 until it reaches the size of the image to be discriminated.
  • the discriminant network after the discriminant network judges the image to be discriminated, it outputs a prediction value of whether the image to be discriminated is a real picture or a picture generated by the generation network.
  • Step 460 determine the discriminant network loss based on the discriminative network result, and adjust the parameters in the discriminant network based on the discriminative network loss.
  • the discriminative network loss is used to measure discriminative network performance.
  • a gradient descent algorithm is used to optimize parameters in the discriminant network.
  • Step 470 determine the generation network loss based on the fusion face image sample, the source face image sample, the target face image sample and the discrimination result of the discrimination network, and adjust the parameters in the generation network based on the generation network loss.
  • the generative network loss is used to measure the performance of the identity encoding network and attribute encoding network.
  • the parameters in the identity encoding network and the parameters in the attribute encoding network are respectively optimized using a gradient descent algorithm.
  • the training sample group is obtained through the generation network, and the parameters of the face fusion model are adjusted through the loss function, and the confrontation training is carried out through the generation network and the confrontation network, so that the trained face fusion model has better robustness , which can adapt to source face image samples and target face image samples with large feature differences, and fuse real and natural fused face image samples.
  • FIG. 5 shows a schematic diagram of a training method for a face fusion model provided by an embodiment of the present application.
  • the identity coding network includes N coding layers connected in series, and N is an integer greater than 1; obtaining the identity feature hidden code of the source face picture sample through the identity coding network includes: passing through the first in the identity coding network The first to n1th encoding layers encode the source face image samples to obtain shallow hidden codes; among them, the shallow hidden codes are used to represent the facial appearance features of the source face image samples; From n1 to n2 coding layers, the shallow hidden code is encoded to obtain the middle layer hidden code; wherein, the middle layer hidden code is used to represent the fine facial features of the source face picture sample; through the n2th in the identity coding network To the Nth coding layer, the middle layer hidden code is encoded to obtain the deep layer hidden code; wherein, the deep layer hidden code is used to represent the face color feature and face microscopic feature of the source face picture sample; wherein, the identity feature hidden code Including: shallow hidden code, middle hidden code and deep hidden code, n1 and n2 are positive integers less than
  • the decoding network includes M decoding layers, and M is an integer greater than 1; through the decoding network, based on identity feature hidden codes and attribute feature hidden codes, fusion is performed to generate fusion face picture samples, including: The code is affine transformed to generate M groups of control vectors; the attribute feature hidden code and M groups of control vectors are decoded through M decoding layers to generate a fusion face image sample; where the input of the first decoding layer includes attribute features Hidden code and the first group of control vectors, the input of the i+1th decoding layer includes the output of the i+1th decoding layer and the i+1th group of control vectors, the output of the Mth decoding layer includes fusion face image samples, i is a positive integer less than M.
  • performing affine transformation on the identity feature hidden codes to generate M groups of control vectors includes: dividing the identity feature hidden codes into M groups of identity feature vectors; performing affine transformation on the M groups of identity feature vectors respectively to generate M groups of control vectors; wherein, each group of control vectors includes at least two control vectors, and different control vectors are used to represent identity features of different dimensions.
  • the discrimination network loss is determined based on the discrimination result, the discrimination loss is the confrontation loss of the discrimination network, and the discrimination loss can be calculated by the following formula:
  • x represents the real image sample
  • G(x s ) represents the fused face image sample generated by the generation network
  • D(G(x s )) represents the discrimination result of the discriminant network on the fused face image sample
  • D(x) represents The discriminant result of the discriminant network for real face image samples.
  • the discriminative result of the discriminant network includes 0 and 1.
  • the discriminative result is 0, which means that the discriminative network believes that the picture to be discriminated is generated by the generation network (fake), and the discriminative result is 1 means that the discriminant network believes that the picture to be discriminated is real.
  • determining the generation network loss based on the fusion face picture sample, the source face picture sample, the target face picture sample and the discrimination result of the discriminant network includes: determining based on the target face picture sample and the fusion face picture sample Perceptual similarity loss, perceptual similarity loss is used to characterize the picture style difference between the target face picture sample and the fusion face picture sample; determine the multi-scale identity feature loss based on the source face picture sample and the fusion face picture sample, multiple The scale identity feature loss is used to characterize the identity feature difference between the source face image sample and the fusion face image sample; the face pose loss is determined based on the target face image sample and the fusion face image sample, and the face pose loss is used to describe The face pose difference between the target face image sample and the fused face image sample; determine the generated network confrontation loss based on the discrimination result; determine the Generate network loss.
  • determining the perceptual similarity loss based on the target face picture sample and the fusion face picture sample includes: extracting the visual features of the target face picture sample and the visual features of the fusion face picture sample through a visual feature extraction network. Features; calculate the similarity between the visual features of the target face image sample and the visual features of the fusion face image sample, and obtain the perceptual similarity loss.
  • the perceptual similarity loss can be calculated by the following formula:
  • x t represents the target face image sample
  • y s2t represents the fusion face image sample
  • F(x t ) is the visual feature of the target face image sample extracted by the target face image sample through the visual feature extraction network
  • F( y s2t ) is the visual feature of the fused face image sample extracted by the fused face image sample through the visual feature extraction network.
  • determining the multi-scale identity feature loss based on the source face picture sample and the fusion face picture sample includes: extracting the identity feature hidden code of the source face picture sample and the fusion face picture respectively through the identity feature extraction network The identity feature hidden code of the sample; calculate the similarity between the identity feature hidden code of the source face image sample and the identity feature hidden code of the fusion face image sample, and obtain the multi-scale identity feature loss.
  • the multi-scale identity feature loss can be calculated by the following formula:
  • x s represents the source face image sample
  • y s2t represents the fusion face image sample
  • N(x s ) is the identity feature of the source face image sample extracted by the target face image sample through the identity feature extraction network
  • N( y s2t ) is the identity feature of the fused face image sample extracted through the identity feature extraction network from the fused face image sample.
  • the VGG Visual Geometry Group, super-resolution test sequence
  • the identity feature extraction network is used as the identity feature extraction network to extract the identity features of the target face picture sample and the fusion face picture sample.
  • determining the face pose loss based on the target face picture sample and the fusion face picture sample includes: determining the face pose loss based on the target face picture sample and the fusion face picture sample, include:
  • the face pose Euler angle intersection of the target face image sample and the face pose Euler angle of the fusion face image sample are respectively extracted;
  • the face pose loss can be calculated by the following formula:
  • x t represents the target face image sample
  • y s2t represents the fused face image sample
  • E(x t ) is the face pose of the target face image sample obtained by extracting the target face image sample through the face pose prediction network.
  • La angle, E(y s2t ) is the face pose Euler angle of the fused face image sample obtained by fusing the face image samples through the face pose prediction network.
  • use the MTCNN Multi-task Cascaded Convolutional Networks, multi-task convolutional neural network
  • the face pose prediction network respectively extract the face pose Euler of the target face picture sample and the fusion face picture sample horn.
  • the determination of the adversarial loss of the generation network based on the discrimination result can be obtained by the following calculation formula:
  • G(x s ) represents the fused face image sample generated by the generator network
  • D(G(x s )) represents the discrimination result of the fused face image sample by the discriminant network.
  • the training process of the face fusion model is as follows:
  • each group of training samples includes a source face image sample and a target face image sample
  • the loss function of the discriminant network is determined by the logistic regression loss function, and the parameters in the discriminant network are optimized by gradient descent;
  • L total W LPIPS *L LPIPIS +W ID *L ID +W POSE *L POSE +W gan *(L g +L d )
  • W LPIPS , W ID , W POSE and W gan are the weights of the corresponding loss in the total loss.
  • the values of W LPIPS , W ID , W POSE and W gan are 1 and 5 respectively. , 5, 5.
  • 16 stages are performed on a training sample set to obtain a face fusion model that can generate realistic face fusion pictures.
  • the face fusion model can better adjust the parameters during the training process.
  • FIG. 6 shows a block diagram of an apparatus for fusing human face pictures according to an embodiment of the present application.
  • the device has the function of realizing the fusion method of the above-mentioned human face picture, and the function can be realized by hardware, and can also be realized by executing corresponding software by the hardware.
  • the device may be the electronic device described above, or may be set in the electronic device.
  • the apparatus 600 may include: a face picture acquisition module 610 , an identity feature acquisition module 620 , an attribute feature acquisition module 630 , and a fusion picture generation module 640 .
  • the human face image obtaining module 610 is configured to obtain a source human face image and a target human face image.
  • the identity feature acquisition module 620 is configured to acquire the identity feature hidden code of the source face picture, and the identity feature hidden code is used to characterize the identity feature of the person in the source face picture.
  • the attribute feature acquisition module 630 is configured to acquire the attribute feature hidden code of the target face picture, and the attribute feature hidden code is used to characterize the attribute feature of the person in the target face picture.
  • the fusion picture generation module 640 is configured to perform fusion based on the identity feature hidden code and the attribute feature hidden code to generate a fusion face picture.
  • the fusion face picture is generated by a face fusion model, and the face fusion model includes an identity encoding network, an attribute encoding network, and a decoding network; wherein, the identity encoding network is used to obtain the source The identity feature hidden code of the face picture; the attribute encoding network is used to obtain the attribute feature hidden code of the target face picture; the decoding network is used to perform based on the identity feature hidden code and the attribute feature hidden code Fusion, generating the fusion face picture.
  • the identity encoding network includes N encoding layers connected in series, and N is an integer greater than 1;
  • the identity feature acquisition module 620 is configured to: pass the first to the first encoding layers in the identity encoding network n1 encoding layers, encoding the source face picture to obtain a shallow hidden code; wherein, the shallow hidden code is used to characterize the facial appearance characteristics of the source face picture; through the identity coding network In the n1th to the n2th coding layer, the shallow layer hidden code is encoded to obtain the middle layer hidden code; wherein, the middle layer hidden code is used to characterize the fine facial features of the source human face picture; by The n2th to the Nth coding layer in the identity coding network encodes the hidden code in the middle layer to obtain a deep hidden code; wherein, the deep hidden code is used to characterize the person of the source face picture Face color features and face microscopic features; wherein, the identity feature hidden codes include: the shallow hidden codes, the middle hidden codes and the deep hidden codes,
  • the fused image generation module 640 includes: a control vector generation unit configured to perform affine transformation on the identity feature hidden code to generate M groups of control vectors; a fusion unit configured to use the M
  • the decoding layer decodes the attribute feature hidden code and the M groups of control vectors to generate the fusion face picture; wherein, the input of the first decoding layer includes the attribute feature hidden code and the first group of control vectors , the input of the i+1 decoding layer includes the output of the i decoding layer and the i+1 group of control vectors, the output of the M decoding layer includes the fused face picture, and i is a positive integer less than M.
  • the fusion unit is configured to divide the identity feature hidden code into M groups of identity feature vectors; respectively perform affine transformation on the M groups of identity feature vectors to generate the M groups of control vectors; Wherein, each set of control vectors includes at least two control vectors, and different control vectors are used to represent identity features of different dimensions.
  • FIG. 7 shows a block diagram of a training device for a face fusion model provided by an embodiment of the present application.
  • the device has the function of realizing the above-mentioned training method of the human face fusion model, and the function can be realized by hardware, and can also be realized by executing corresponding software by the hardware.
  • the device may be the analysis device described above, or may be set in the analysis device.
  • the device 700 may include: a training sample acquisition module 710, an identity feature acquisition module 720, an attribute feature acquisition module 730, a fusion picture generation module 740, a face picture discrimination module 750, a first parameter adjustment module 760 and a second parameter adjustment module 770 .
  • the training sample acquisition module 710 is configured to acquire training samples of the human face fusion model, the training samples include source human face picture samples and target human face picture samples.
  • the identity feature acquisition module 720 is configured to acquire the identity feature hidden code of the source face picture sample through the identity encoding network, and the identity feature hidden code is used to characterize the identity feature of the person in the source face picture sample.
  • the attribute feature acquisition module 730 is configured to acquire the attribute feature hidden code of the target face picture sample through the attribute encoding network, and the attribute feature hidden code is used to characterize the attribute feature of the person in the target face picture sample.
  • the fused picture generation module 740 is configured to perform fusion based on the identity feature hidden code and the attribute feature hidden code through the decoding network to generate a fused face picture sample.
  • the face picture identification module 750 is configured to determine whether the sample to be identified is generated by the generation network through the identification network, and the sample to be identified includes the fused face image sample.
  • the first parameter adjustment module 760 is configured to determine a discriminant network loss based on the discrimination result of the discriminant network, and adjust parameters in the discriminant network based on the discriminative network loss.
  • the second parameter adjustment module 770 is configured to determine and generate a network loss based on the fusion face image sample, the source face image sample, the target face image sample, and the discrimination result of the discriminant network, and based on the The generating network loss adjusts the parameters in the generating network.
  • the decoding network includes M decoding layers, M is an integer greater than 1, and the identity feature acquisition module 720 is configured to: pass the 1st to n1th encodings in the identity encoding network layer, encoding the source human face image samples to obtain shallow hidden codes; wherein, the shallow layer hidden codes are used to characterize the facial appearance features of the source human face image samples; through the identity encoding network The n1th to the n2th coding layer, the shallow layer hidden code is encoded, and the middle layer hidden code is obtained; wherein, the middle layer hidden code is used to characterize the fine facial features of the source human face picture sample; by The n2th to the Nth coding layer in the identity coding network encodes the hidden code in the middle layer to obtain a deep hidden code; wherein, the deep hidden code is used to characterize the source face image sample Face color features and face microscopic features; wherein, the identity feature hidden code includes: the shallow hidden code, the middle hidden code and the deep hidden code, n1 and
  • the decoding network includes M decoding layers, M is an integer greater than 1, and the sample fusion picture generation module 740 is configured to: perform affine transformation on the identity feature hidden code to generate M sets of Control vector; decoding the attribute feature hidden code and the M groups of control vectors through the M decoding layers to generate the fusion face picture sample; wherein, the input of the first decoding layer includes the attribute Feature hidden code and the first group of control vectors, the input of the i+1 decoding layer includes the output of the i+1 decoding layer and the i+1 group of control vectors, and the output of the M decoding layer includes the fusion face picture Sample, i is a positive integer less than M.
  • the second parameter adjustment module 770 includes: a first loss function unit configured to determine a perceptual similarity loss based on the target face picture sample and the fused face picture sample, the perceptual The similarity loss is used to characterize the picture style difference between the target face picture sample and the fusion face picture sample; the second loss function unit is configured to The picture sample determines the multi-scale identity feature loss, and the multi-scale identity feature loss is used to characterize the identity feature difference between the source face picture sample and the fusion face picture sample; the third loss function unit is configured For determining the face pose loss based on the target face picture sample and the fusion face picture sample, the face pose loss is used to determine and generate a network confrontation loss based on the discrimination result; according to the perceptual similarity loss, The multi-scale identity feature loss, the face pose loss and the network confrontation loss determine the generation network loss.
  • the first loss function unit is configured to extract the visual features of the target human face picture sample and the visual features of the fusion human face picture sample respectively through a visual feature extraction network; calculate the target The similarity between the visual features of the human face picture sample and the visual features of the fusion human face picture sample is obtained to obtain the perceptual similarity loss.
  • the second loss function unit is configured to extract the identity feature hidden code of the source face picture sample and the identity feature hidden code of the fused face picture sample respectively through the identity feature extraction network. code; calculate the similarity between the identity feature hidden code of the source face picture sample and the identity feature hidden code of the fusion face picture sample, and obtain the multi-scale identity feature loss.
  • the third loss function unit is configured to separately extract the face pose Euler angles of the target face image sample and the face of the fused face image sample through the face pose prediction network Posture Euler angles: Calculate the similarity between the face pose Euler angles of the target face picture sample and the face pose Euler angles of the fusion face picture sample to obtain the face pose loss.
  • the division of the above-mentioned functional modules is used as an example for illustration. In practical applications, the above-mentioned function allocation can be completed by different functional modules according to the needs.
  • the content structure of the device is divided into different functional modules to complete all or part of the functions described above.
  • the device and the method embodiment provided by the above embodiment belong to the same idea, and the specific implementation process thereof is detailed in the method embodiment, and will not be repeated here.
  • FIG. 8 shows a structural block diagram of a computer device 800 provided by an embodiment of the present application.
  • the computer device 800 can be used to implement the above method for generating a fused face; it can also be used to implement the above method for training a human face fusion model.
  • the computer device 800 includes: a processor 801 and a memory 802 .
  • the processor 801 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like.
  • the processor 801 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used for rendering and drawing the content that needs to be displayed on the display screen.
  • the processor 801 may also include an AI (Artificial Intelligence, artificial intelligence) processor, where the AI processor is used to process computing operations related to machine learning.
  • AI Artificial Intelligence, artificial intelligence
  • Memory 802 may include one or more computer-readable storage media, which may be non-transitory.
  • the memory 802 may also include high-speed random access memory, and non-volatile memory.
  • FIG. 8 does not constitute a limitation to the computer device 800, and may include more or less components than shown in the figure, or combine certain components, or adopt a different component arrangement.
  • a computer device comprising a processor and a memory in which a computer program is stored.
  • the computer program is configured to be executed by one or more processors, so as to realize the above-mentioned fusion method of human face pictures, or realize the above-mentioned training method of human face fusion model.
  • the computer equipment can be called an image processing equipment, and is used for realizing the fusion method of human faces and pictures.
  • Computer equipment can also be called model training equipment, which is used to realize the training method of the human face fusion model.
  • a computer-readable storage medium is also provided, and a computer program is stored in the storage medium, and when the computer program is executed by a processor of a computer device, the above-mentioned fusion method of a human face picture is realized, Or realize the training method of the above-mentioned face fusion model.
  • the above-mentioned computer-readable storage medium may be ROM (Read-Only Memory, read only memory), RAM (Random Access Memory, random access memory), etc.
  • a computer program product is also provided.
  • the computer device is made to perform the fusion method of the above-mentioned human face picture, or the training of the above-mentioned human face fusion model method.
  • the "plurality” mentioned herein refers to two or more than two.
  • “And/or” describes the association relationship of associated objects, indicating that there may be three types of relationships, for example, A and/or B may indicate: A exists alone, A and B exist simultaneously, and B exists independently.
  • the character "/” generally indicates that the contextual objects are an "or” relationship.
  • the numbering of the steps described herein only exemplarily shows a possible sequence of execution among the steps. In some other embodiments, the above-mentioned steps may not be executed according to the order of the numbers, such as two different numbers The steps are executed at the same time, or two steps with different numbers are executed in the reverse order as shown in the illustration, which is not limited in this embodiment of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

一种人脸图片的融合方法、装置、设备及存储介质,属于机器学习领域。所述方法包括:获取源人脸图片和目标人脸图片(210);获取源人脸图片的身份特征隐码,身份特征隐码用于表征源人脸图片中人物身份特征(220);获取目标人脸图片的属性特征隐码,属性特征隐码用于表征目标人脸图片中人物属性特征(230);基于身份特征隐码和属性特征隐码进行融合,生成融合人脸图片(240)。上述融合方法,在源人脸与目标人脸特征差异过大的情况下,也能生成真实的融合人脸图片。

Description

人脸图片的融合方法、装置、设备及存储介质
本申请要求于2021年09月16日提交的、申请号为202111089159.1、发明名称为“人脸图片的融合方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及机器学习技术领域,特别涉及一种人脸图片的融合方法、装置、设备及存储介质。
背景技术
人脸融合是指将两张人脸图片融合成一张人脸图片的过程,通过人脸融合过程获得的人脸同时具有两张图片中人脸的特征。现阶段,人脸融合技术在各类照片修图、视频剪辑等领域有广泛应用。
在相关技术中,采用三角剖分的方法对源人脸图片和目标人脸图片进行划分获得融合图片。首先,将源人脸图片与目标人脸图片中的人脸位置进行对齐;并分别在源人脸图片与目标人脸图片上提取能表示人物身份的特征点和定位点,通常选择人脸图片中五官轮廓上的点作为特征点,选择画面边缘和人脸轮廓线上的点作为定位点;将定位点分别与特征点连接,根据三角剖分算法获得若干个三角剖分区;对于源人脸图片上任意一个三角剖分区,在目标人脸图片上找到相应的三角剖分区,针对上述两个三角剖分区进行映射变换,得到融合三角剖分区,基于上述两个三角剖分区的像素值确定融合三角剖分区的像素值;基于所有融合三角剖分区生成融合人脸图片。
然而,在通过三角剖分方法进行人脸融合时,在源人脸与目标人脸特征差异较大的情况下,例如源人脸图片与目标人脸图片的人脸角度或人脸肤色或光照条件等方面差异较大时,基于三角剖分的人脸融合方法无法融合出自然和谐的人脸。
发明内容
本申请实施例提供了一种人脸图片的融合方法、装置、设备及存储介质。技术方案如下:
根据本申请实施例的一个方面,提供了一种人脸图片的融合方法,所述方法由计算机设备执行,所述方法包括:
获取源人脸图片和目标人脸图片;
获取所述源人脸图片的身份特征隐码,所述身份特征隐码用于表征所述源人脸图片中人物身份特征;
获取所述目标人脸图片的属性特征隐码,所述属性特征隐码用于表征所述目标人脸图片中人物属性特征;
基于所述身份特征隐码和所述属性特征隐码进行融合,生成融合人脸图片。
根据本申请实施例的一个方面,提供了一种人脸融合模型的训练方法,所述方法由计算机设备执行,所述人脸融合模型包括生成网络和判别网络,所述生成网络包括身份编码网络、属性编码网络和解码网络;所述方法包括:
获取人脸融合模型的训练样本,所述训练样本包括源人脸图片样本和目标人脸图片样本;
通过所述身份编码网络获取所述源人脸图片样本的身份特征隐码,所述身份特征隐码是用于表征所述源人脸图片样本中人物身份特征;
通过所述属性编码网络获取所述目标人脸图片样本的属性特征隐码,所述属性特征隐码 用于表征所述目标人脸图片样本中人物属性特征;
通过所述解码网络基于所述身份特征隐码和所述属性特征隐码进行融合,生成融合人脸图片样本;
通过所述判别网络确定待判别样本是否由所述生成网络生成,所述待判别样本包括所述融合人脸图片样本;
基于所述判别网络的判别结果确定判别网络损失,以及基于所述判别网络损失对所述判别网络中的参数进行调整;
基于所述融合人脸图片样本、所述源人脸图片样本、所述目标人脸图片样本和所述判别网络的判别结果确定生成网络损失,以及基于所述生成网络损失对所述生成网络中的参数进行调整。
根据本申请实施例的一个方面,提供了一种人脸图片的融合装置,所述装置包括:
人脸图片获取模块,配置为获取源人脸图片和目标人脸图片;
身份特征获取模块,配置为获取所述源人脸图片的身份特征隐码,所述身份特征隐码用于表征所述源人脸图片中人物身份特征;
属性特征获取模块,配置为获取所述目标人脸图片的属性特征隐码,所述属性特征隐码用于表征所述目标人脸图片中人物属性特征;
融合图片生成模块,配置为基于所述身份特征隐码和所述属性特征隐码进行融合,生成融合人脸图片。
根据本申请实施例的一个方面,提供了一种人脸融合模型的训练装置,所述人脸融合模型包括生成网络和判别网络,所述生成网络包括身份编码网络、属性编码网络和解码网络;所述装置包括:
训练样本获取模块,配置为获取人脸融合模型的训练样本,所述训练样本包括源人脸图片样本和目标人脸图片样本;
身份特征获取模块,配置为通过所述身份编码网络获取所述源人脸图片样本的身份特征隐码,所述身份特征隐码是用于表征所述源人脸图片样本中人物身份特征;
属性特征获取模块,配置为通过所述属性编码网络获取所述目标人脸图片样本的属性特征隐码,所述属性特征隐码用于表征所述目标人脸图片样本中人物属性特征;
融合图片生成模块,配置为通过所述解码网络基于所述身份特征隐码和所述属性特征隐码进行融合,生成融合人脸图片样本;
人脸图片判别模块,配置为通过所述判别网络确定待判别样本是否由所述生成网络生成,所述待判别样本包括所述融合人脸图片样本;
第一参数调整模块,配置为基于所述判别网络的判别结果确定判别网络损失,以及基于所述判别网络损失对所述判别网络中的参数进行调整;
第二参数调整模块,配置为基于所述融合人脸图片样本、所述源人脸图片样本、所述目标人脸图片样本和所述判别网络的判别结果确定生成网络损失,以及基于所述生成网络损失对所述生成网络中的参数进行调整。
根据本申请实施例的一个方面,提供了一种计算机设备,所述计算机设备包括处理器和存储器,所述存储器中存储有计算机程序,所述处理器执行所述计算机程序以实现上述人脸图片的融合方法,或实现上述人脸融合模型的训练方法。
根据本申请实施例的一个方面,提供了一种计算机可读存储介质,所述存储介质中存储有计算机程序,所述计算机程序用于被处理器执行,以实现上述人脸图片的融合方法,或实现上述人脸融合模型的训练方法。
根据本申请的一个方面,提供了一种计算机程序产品,当所述计算机程序产品在计算机设备上运行时,使得计算机设备执行上述人脸图片的融合方法,或者上述人脸融合模型的训练方法。
本申请实施例提供的技术方案可以带来如下有益效果:
通过对源人脸图片的身份特征隐码进行提取,对目标人脸图片的属性特征隐码进行提取,根据身份特征隐码和属性特征隐码进行融合,获得融合人脸图片,提供了一种生成真实度高的融合人脸图片的方法,即使在源人脸图片与目标人脸图片之间人脸角度、肤色等特征差异过大的情况下,也能够生成清晰、逼真的融合人脸图片。
附图说明
图1是本申请一个实施例提供的实施环境的示意图;
图2是本申请一个实施例提供的人脸图片的融合方法的流程图;
图3本申请另一个实施例提供的人脸图片的融合方法的示意图;
图4是本申请一个实施例提供的人脸融合模型的训练方法的流程图;
图5是本申请一个实施例提供的人脸融合模型的训练方法的示意图;
图6是本申请一个实施例提供的人脸图片的融合装置的框图;
图7是本申请另一个实施例提供的人脸融合模型的训练装置的框图;
图8是本申请一个实施例提供的计算机设备的示意图。
具体实施方式
在介绍本申请技术方案之前,先对本申请涉及的一些背景技术知识进行介绍说明。以下相关技术作为可选方案与本申请实施例的技术方案可以进行任意结合,其均属于本申请实施例的保护范围。本申请实施例包括以下内容中的至少部分内容。
下面,对本申请中出现的一些名词进行介绍。
计算机视觉(Computer Vision,CV)是指计算机从一张图像或一系列图片中自动提取、分析和理解有用的信息。计算机视觉技术涵盖的领域包括场景重建、事件检测、视频跟踪、目标识别、三维姿态估计、运动估计和图像恢复等技术,还包括常见的人脸识别、指纹识别等生物特征识别技术以及人脸融合等技术。
生成式对抗网络(Generative Adversarial Networks,GAN)由生成神经网络和判别神经网络组成,生成神经网络用于对输入数据进行处理,产生生成数据,判别神经网络用于判别真实数据和生成数据。在训练过程中,生成神经网络与判别神经网络相互对抗,生成神经网络根据生成网络损失函数,调节自身网络参数,使得产生的生成数据能够误导判别神经网络的判断结果。判别神经网络根据判别网络损失函数调节自身的网络参数,使得判别神经网络能够正确辨别出真实数据和生成数据。经过一定次数的训练后,生成神经网络产生的生成数据与真实数据接近。判别器无法辨别生成数据和真实数据。
下面对仿射变换进行介绍说明。
仿射变换(Affine Transformation,AF)是指在几何中,对一个向量空间进行一次线性变化并进行一次平移得到新的向量空间。
以二维向量空间为例,二维坐标(x,y)经过仿射变换得到二维坐标(u,v)的过程为:
U=a 1*x+b 1*y+c 1
V=a 2*x+b 2*y+c 2
通过仿射变换可以实现对二维图像的平移、缩放和旋转等操作。
仿射变化后能保持二维图像的平直行和平行性,平直性是指直线经过仿射变换后得到的依然是直线,圆弧经过仿射变换得到的依旧是圆弧;平行性是指直线间进行仿射变化后的相对位置关系不变,直线上的点经过仿射变换后的相对位置不发生变化。
下面对自适应归一化(Adaptive Instance Normalization,AdaIN)操作进行介绍说明。
AdaIN操作需要输入一个内容x和一个样式特征y,并根据以下格式将x的通道平均值和方差与y的平均值和方差匹配。
AdaIN(x,y)=σ(y)(x-μ(x)/σ(x))+μ(y)
例如,存在一个特定风格纹路的样式特征,通过一个AdaIN操作层进行归一化处理后,具有这种纹路的样式特征在该层产生较高的平均激活值。通过AdaIN处理产生的输出在保持内容x空间结构的同时,对该样式特征具有很高的平均激活度。通过解码器能够将此样式特征转换到内容x的图像空间中,通过该纹路样式特征的方差可以将更细微的风格特征信息传递到AdaIN输出和最终输出的图像中。简而言之,AdaIN通过迁移特征统计量,即通道方向上的均值和方差,在特征空间中实现风格迁移。
请参考图1,其示出了本申请一个实施例提供的方案实施环境示意图。该方案实施环境可以实现称为一个人脸融合系统。该方案系统构架可以包括服务器10和至少一个终端设备20。
终端设备20可以是诸如手机、平板电脑、PC(Personal Computer,个人计算机)、智能电视、多媒体播放设备等电子设备。目标应用程序上携带了人脸融合模型,终端设备20上运行有目标应用程序,该目标应用程序可以是拍照应用程序、视频应用程序和社交应用程序等,目标应用程序的类型在此不进行限定。在一些实施例中,目标应用程序部署在终端设备20上,人脸图片的融合过程可以在终端设备上进行,终端设备获取源人脸图片和目标人脸图片,针对源人脸图片提取身份特征隐码,针对目标人脸图片提取属性特征隐码,并将身份特征隐码与属性特征隐码进行融合,生成融合人脸图片,完成人脸图片的融合过程。
服务器10是可以运行目标应用程序的后台服务器。服务器10可以是一台服务器,也可以是由多台服务器组成的服务器集群,或者是一个云计算服务中心。在另一些实施例中,人脸图片的融合过程也可以在服务器10上进行,终端设备20将获取到的源人脸图片和目标人脸图片上传给服务器10,服务器10针对源人脸图片提取身份特征隐码,针对目标人脸图片提取属性特征隐码,并将身份特征隐码与属性特征隐码进行融合,生成融合人脸图片,并将生成的融合图片发送给终端设备20,完成人脸图片的融合过程。
终端设备20和服务器10之间可以通过网络进行通信。
本申请实施例描述的系统架构以及业务场景是为了更加清楚地说明本申请实施例的技术方案,并不构成对本申请实施例提供的技术方案的限定,本领域普通技术人员可知,随着方案实施环境的演变和新业务场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
请参考图2,其示出了本申请一个实施例提供的人脸图片的融合方法的流程图,该方法各步骤的执行主体可以是图1所示方案实施环境中的终端设备20,也可以是服务器10。为了描述方便,下面以计算机设备作为执行主体,该方法可以包括如下几个步骤(210-240)中的至少一个步骤:
步骤210,获取源人脸图片和目标人脸图片。
源人脸图片是指需要按照某种样式进行改造的人脸图片,源人脸图片一般是用户提供的、真实的图片,例如用户通过手机、相机等工具拍摄的人物照片。目标人脸图片是指能为源人脸图片提供样式变化的人脸图片,目标人脸图片可以是由终端设备上的应用程序提供的人脸图片,也可以是用户上传的人脸图片。在本申请实施例中,对源人脸图片与目标人脸图片的获取方式不作限定。
步骤220,获取源人脸图片的身份特征隐码,身份特征隐码用于表征源人脸图片中人物身份特征。
身份特征隐码用于表征源人脸图片中人脸的五官的形状、五官之间的相对位置和脸型等特征,这些特征与人物身份有关。也即通常不同人脸具有不同的人脸的五官的形状、五官之 间的相对位置和脸型特征。因此,从不同的源人脸图片中能获取到不同的身份特征隐码。在一些实施例中,身份特征隐码是通过身份编码网络对源人脸图片进行编码获取的。
步骤230,获取目标人脸图片的属性特征隐码,属性特征隐码用于表征目标人脸图片中人物属性特征。
目标人脸图片中的人物属性特征包括但不限于以下至少一种:目标人脸图片中人脸妆容、人脸肤色、人物发型、配饰和头部姿势等特征。目标人脸图片的头部姿势特征是指三维空间下目标人脸的偏转角度在二维图片中的映射,目标人脸是指目标人脸图片中的人脸,目标人脸的头部姿势包括俯仰角(pitch)、偏航角(yaw)和旋转角(roll),例如,在正视镜头的情况下,目标人脸图片的头部姿势的俯仰角、偏航角和旋转角均为0°。在一些实施例中,属性特征隐码是通过属性编码网络对目标人脸图片进行编码获取的。
在一些实施例中,获取源人脸图片的身份特征隐码和获取目标人脸图片的属性特征隐码在两个不同的编码网络中进行,因此获取源人脸图片的身份特征隐码和获取目标人脸图片的属性特征隐码可以同时进行,也可以依次先后进行,本申请对此不作限定。
步骤240,基于身份特征隐码和属性特征隐码进行融合,生成融合人脸图片。
融合人脸图片是指兼具源人脸图片的身份特征和目标人脸图片的属性特征的图片,融合人脸图片中的人脸在视觉效果上更接近源人脸图片,在人物妆容姿态上更接近目标人脸图片。人脸融合模型中包括身份编码网络和属性编码网络。在一些实施例中,人脸融合模型基于身份特征隐码和属性特征隐码进行融合,生成融合人脸图片。
综上所述,本申请实施例提供的技术方案,通过获取源人脸图片和目标人脸图片;基于源人脸图片获取身份特征隐码并基于目标人脸图片获取属性特征隐码;对身份特征隐码和属性特征隐码进行融合,得到自然、逼真的融合人脸图片。
此外,相关技术中,通过将源人脸图片和目标人脸图片对应的三角剖分区进行融合获得融合人脸图片,在源人脸图片和目标人脸图片特征差异较大的情况下,融合人脸图片中的某些特征受到源人脸图片和目标人脸图片的共同影响,导致在融合人脸图片中相应的特征不符合实际,使得融合图片中的人脸真实性较差。本实施例通过源人脸图片获取身份特征隐码,通过目标人脸图片获得属性特征隐码,在融合过程中使用身份特征隐码控制融合人脸图片中生成人脸的身份特征,通过属性特征隐码控制融合人脸图片中生成人脸的属性特征,避免了源人脸图片中人脸的特征和目标人脸图片中人脸的特征存在较大差异时,生成的融合人脸图片不真实的情况。
下面,对通过人脸融合模型生成融合人脸图片的方法进行介绍说明。
请参考图3,其示出本申请另一个实施例提供的人脸图片的融合方法的示意图。
在一些实施例中,融合人脸图片由人脸融合模型生成,人脸融合模型包括身份编码网络、属性编码网络和解码网络;其中,身份编码网络用于获取源人脸图片的身份特征隐码;属性编码网络用于获取目标人脸图片的属性特征隐码;解码网络用于基于身份特征隐码和属性特征隐码进行融合,生成融合人脸图片。
在一些实施例中,身份编码网络和属性编码网络都具有N个串联的编码层,身份编码网络和属性编码网络对应编码层的结构和参数对应相同。通过身份编码网络获取的身份特征隐码的大小和通过属性编码网络获得的属性特征隐码的大小相同。在身份编码网络和属性编码网络中第n层的输入是第n-1层的输出,n为小于等于N的正整数。在一些实施例中,身份编码网络和属性编码网络的任意一个编码层结构都采用了ResNet Block(残差神经网络块),在任意一个编码层中,对于上一编码层输入的中间隐码,首先通过1*1的卷积核进行卷积,并使用LReLu(Leaky Rectified Linear unit,弱线性整合单元)进行激活,其次,通过一个3*3的卷积核进行卷积,并使用LReLu进行激活,最后,增大像素值,通过另一个3*3的卷积核进行卷积,并使用LReLu进行激活,将得到的中间隐码传输给下一编码层。
属性编码网络对目标人脸图片进行编码,通过全连接层输出属性特征隐码。
通过采用具有N个编码层的身份编码网络对源人脸图片进行编码以及采用具有N个编码层的属性编码网络对目标人脸图片进行编码,实现了在编码过程中对身份特征和属性特征进行解耦,有效地避免了特征纠缠。
在一些实施例中,身份编码网络包括N个串联的编码层,N为大于1的整数;获取源人脸图片的身份特征隐码,包括:通过身份编码网络中的第1个至第n1个编码层,对源人脸图片进行编码处理,得到浅层隐码;其中,浅层隐码用于表征源人脸图片的面部外观特征;通过身份编码网络中的第n1个至第n2个编码层,对浅层隐码进行编码处理,得到中层隐码;其中,中层隐码用于表征源人脸图片的精细面部特征;通过身份编码网络中的第n2个至第N个编码层,对中层隐码进行编码处理,得到深层隐码;其中,深层隐码用于表征源人脸图片的人脸颜色特征和人脸微观特征;其中,身份特征隐码包括:浅层隐码、中层隐码和深层隐码,n1、n2为小于N的正整数。
身份编码网络对源人脸图片进行多层次获取,得到了具有不同感受野的身份特征隐码。其中,浅层隐码是在分辨率低并且经过较少编码层编码处理后获得的身份特征隐码,因此浅层隐码的感受野较小,浅层隐码中的像素值映射在源人脸图片上的像素区域较小,浅层隐码中的特征较粗糙,因此浅层隐码表征源人脸图片的面部外观特征,例如源人脸图片的脸部轮廓、发型和姿势等。随着编码层数增加、分辨率增大,中层隐码通过多次卷积使得感受野增大,中层隐码中的像素值映射在源人脸图片中的像素区域增大,中层隐码中表征的特征愈加细致,因此中层隐码表征源人脸图片的更精细的面部特征,例如,源人脸图片中眼睛的开合、五官的细节等。随着编码层数继续增加,分辨率进一步增加,深层隐码中的像素值映射在原人脸图片中的像素区域达到最大,深层隐码用于表征源人脸图片中更精细的身份特征,例如源人脸图片中人脸的肤色,瞳孔颜色等。
身份编码网络输出的浅层隐码大小为a1,中层隐码的大小为a2,深层隐码的大小为a3,在一些实施例中,a1=a2=a3。在一些实施例中,a1,a2,a3大小不相等,人脸融合模型根据身份编码网络的特性划分浅层隐码、中层隐码和深层隐码的大小,例如由身份编码网络的结构特性决定,浅层隐码中特征纠缠较小,则增大浅层隐码的大小,缩减中层隐码的大小和深层隐码的大小。
在一些实施例中,身份编码网络具有6个编码层,n1=2,n2=4,则浅层隐码由第2个编码层输出,中层隐码由第4个编码层输出,深层隐码由第6个编码层输出。身份特征隐码由浅层隐码、中层隐码和深层隐码组成,在一些实施例中,身份编码网络获取的浅层隐码大小为8*512,中层隐码大小为6*512,深层隐码的大小为2*512,身份特征隐码的大小为16(8+6+2)*512。
在一些实施例中,解码网络包括M个解码层,M为大于1的整数;基于身份特征隐码和属性特征隐码进行融合,生成融合人脸图片,包括:对身份特征隐码进行仿射变换,生成M组控制向量;通过M个解码层对属性特征隐码和M组控制向量进行解码处理,生成融合人脸图片;其中,第1个解码层的输入包括属性特征隐码和第1组控制向量,第i+1个解码层的输入包括第i个解码层的输出和第i+1组控制向量,第M个解码层的输出包括融合人脸图片,i为小于M的正整数。
对身份特征隐码进行仿射变换后,身份特征隐码中特征之间的相对位置关系不发生改变,仿射变换能过滤掉特征出现的位置而保留特征之间的相对关系。控制向量用于控制融合人脸图片的样式。
在一些实施例中,对身份特征隐码进行仿射变换,生成M组控制向量,包括:将身份特征隐码划分为M组身份特征向量;对M组身份特征向量分别进行仿射变换,生成M组控制向量;其中,每组控制向量包括至少两个控制向量,不同的控制向量用于表征不同维度的身份特征。
在一些实施例中,通过将身份特征隐码中相邻两个特征或分为一组,得到M组控制向量,例如,身份特征隐码的大小为16*512,将相邻的两列身份特征(1*512)划分成一个控制向量组,不同维度的身份特征能够表示源人脸图片不同类别的身份特征,在一些实施例中不同维度的身份特征具有不同的感受野,因此不同维度的身份特征表征不同粒度的特征。在一些实施例中,不同维度的身份特征的感受野相同,此时,不同维度的身份特征表征源人脸图片不同类型的身份特征,例如某个控制向量组包括表征源人脸图片眼睛形状的特征和表征源人脸图片鼻子形状的特征。
在一些实施例中,通过M个解码层对属性特征隐码和M组控制向量进行解码处理,生成融合人脸图片包括,在M个解码层中的第i个解码层中,接收第i-1层的输出,和第i层对应的控制向量组,控制向量组中包括第一控制向量和第二控制向量,解码层先将第i-1层的输入向量与第一控制向量进行自适应归一化操作得到中间向量,通过大小为3*3的卷积核对中间向量进行卷积,将卷积后的向量与第二控制向量进行自适应归一化操作,将自适应归一化操作后得到的向量输入到第i+1层,完成一个解码层的解码操作。
在一些实施例中,解码网络包括8个解码层,解码网络将属性特征编码作为第1个解码层的输入,重复8次上述单个解码层进行的解码步骤,在第8个解码层中输出像素值为512*512融合人脸图片。
通过多个编码层进行编码可以避免特征隐码之间的相互纠缠,通过解码网络对属性特征隐码和控制向量组进行解码能够通过控制向量控制融合人脸图片的身份特征,生成真实自然的融合人脸图片。
下面,通过实施例对人脸融合模型的训练流程进行介绍说明,有关该人脸融合模型的使用过程中涉及的内容和训练过程中涉及的内容是相互对应的,两者互通,如在一侧未作详细说明的地方,可以参考另一侧的描述说明。
请参考图4,其示出了本申请一个实施例提供的人脸融合模型的训练方法的流程图,本方法各步骤的执行主体可以服务器10,也可以是一台计算机,为了描述方便,下面以计算机设备作为执行主体,该方法可以包括如下几个步骤(410-470)中的至少一个步骤:
步骤410,获取人脸融合模型的训练样本,训练样本包括源人脸图片样本和目标人脸图片样本。
人脸融合模型包括生成网络和判别网络,生成网络包括身份编码网络、属性编码网络和解码网络。
人脸融合模型是一个生成式对抗网络模型,在一些实施例中,人脸融合模型的输入包括源人脸图片样本和目标人脸图片样本。每一个训练样本包括两张图片样本,一张作为源人脸图片样本,另一张作为目标人脸图片样本。使用上述训练样本对人脸融合模型进行训练,可以训练得到能够生成真实融合人脸图片的人脸融合模型。一个训练样本组中的两个图片样本可以是不同的人物,也可以具有不同的属性特征。使用多个训练样本组对人脸融合模型进行训练,使得经过训练的人脸融合模型在输入的源人脸图片样本与目标人脸图片样本差异较大的情况下,依旧能生成真实自然的融合人脸图片。在一些实施例中,训练样本来自高清人脸数据集(Flickr Faces High Quality,FFHQ),该数据集中包括不同性别,人脸角度,表情,妆容的人脸图片,将上述高清人脸数据集分成源人脸图片样本组和目标人脸图片样本组,每一个训练样本组在上述源人脸图片样本组和目标人脸图片样本组中分别选择一张图片样本作为该训练样本组的源人脸图片样本和目标人脸图片样本。
步骤420,通过身份编码网络获取源人脸图片样本的身份特征隐码,身份特征隐码是用于表征源人脸图片样本中人物身份特征。
在训练过程中,不同的源人脸图片样本之间人脸的角度、身份特征之间存在差异,通过训练,身份编码网络能将上述特征信息进行解耦,使得通过身份编码网络编码获得的源人脸 图片样本身份特征的隐码的特征纠缠少。
步骤430,通过属性编码网络获取目标人脸图片样本的属性特征隐码,属性特征隐码用于表征目标人脸图片样本中人物属性特征。
在训练过程中,不同的目标人脸图片样本之间人脸的姿态,妆容,环境因素之间存在差异,通过训练,属性编码网络能将上述特征信息进行解耦,使得通过属性编码网络编码获得的目标人脸图片样本属性特征的隐码的特征纠缠少。
步骤440,通过解码网络基于身份特征隐码和属性特征隐码进行融合,生成融合人脸图片样本。
解码网络是经过预训练的网络,在人脸融合模型的训练过程中,解码网络不参与训练,解码网络仅仅用于将身份特征隐码和属性特征隐码进行解码,生成高清逼真的人脸融合图片样本。
在一些实施例中,解码网络采用StyleGAN网络结构中的解码网络对身份特征隐码和属性特征隐码进行解码。
步骤450,通过判别网络确定待判别样本是否由生成网络生成,待判别样本包括融合人脸图片样本。
判别网络采用逐层增长的方式判别待判别图像是否为真实图片。判别网络从像素值为4*4的RGB图像开始渐进式增长图片的像素值,将待判别图像像素扩大至8*8,6*16,32*32直至达到待判别图像大小为止。
在一些实施例中,判别网络对待判别图像进行判断后,输出待判别图像是真实图片或生成网络生成图片的预测值。
步骤460,基于判别网络的判别结果确定判别网络损失,以及基于判别网络损失对判别网络中的参数进行调整。
判别网络损失用于衡量判别网络性能。在一些实施例中,基于该判别网络损失,采用梯度下降算法对判别网络中的参数进行优化。
步骤470,基于融合人脸图片样本、源人脸图片样本、目标人脸图片样本和判别网络的判别结果确定生成网络损失,以及基于生成网络损失对生成网络中的参数进行调整。
由于生成网络中的解码网络不参与训练,因此生成网络损失用于衡量身份编码网络和属性编码网络的性能。在一些实施例中,基于该生成网络损失,采用梯度下降算法对身份编码网络中的参数和属性编码网络中的参数分别进行优化。
综上所述,通过生成网络获取训练样本组,并通过损失函数调节人脸融合模型的参数,通过生成网络与对抗网络进行对抗训练,使得训练后的人脸融合模型具有较好的鲁棒性,能够适应特征差异较大的源人脸图片样本和目标人脸图片样本,融合出真实自然的融合人脸图片样本。
请参考图5,其示出了本申请一个实施例提供的人脸融合模型的训练方法的示意图。
在一些实施例中,身份编码网络包括N个串联的编码层,N为大于1的整数;通过身份编码网络获取源人脸图片样本的身份特征隐码,包括:通过身份编码网络中的第1个至第n1个编码层,对源人脸图片样本进行编码处理,得到浅层隐码;其中,浅层隐码用于表征源人脸图片样本的面部外观特征;通过身份编码网络中的第n1个至第n2个编码层,对浅层隐码进行编码处理,得到中层隐码;其中,中层隐码用于表征源人脸图片样本的精细面部特征;通过身份编码网络中的第n2个至第N个编码层,对中层隐码进行编码处理,得到深层隐码;其中,深层隐码用于表征源人脸图片样本的人脸颜色特征和人脸微观特征;其中,身份特征隐码包括:浅层隐码、中层隐码和深层隐码,n1、n2为小于N的正整数。
关于身份编码网络的编码过程请参考上一个实施例,在此不进行赘述。
在一些实施例中,解码网络包括M个解码层,M为大于1的整数;通过解码网络基于身 份特征隐码和属性特征隐码进行融合,生成融合人脸图片样本,包括:对身份特征隐码进行仿射变换,生成M组控制向量;通过M个解码层对属性特征隐码和M组控制向量进行解码处理,生成融合人脸图片样本;其中,第1个解码层的输入包括属性特征隐码和第1组控制向量,第i+1个解码层的输入包括第i个解码层的输出和第i+1组控制向量,第M个解码层的输出包括融合人脸图片样本,i为小于M的正整数。
在一些实施例中,对身份特征隐码进行仿射变换,生成M组控制向量,包括:将身份特征隐码划分为M组身份特征向量;对M组身份特征向量分别进行仿射变换,生成M组控制向量;其中,每组控制向量包括至少两个控制向量,不同的控制向量用于表征不同维度的身份特征。
关于解码网络的解码过程请参考上一个实施例,在此不进行赘述。
在一些实施例中,基于判别结果确定判别网络损失,该判别损失是判别网络的对抗损失,该判别损失可以通过如下公式计算得到:
L d=log(exp(D(G(x s)))+1)+log(exp(D(x))+1)
其中,x表示真实图片样本,G(x s)表示生成网络生成的融合人脸图片样本,D(G(x s))表示判别网络对融合人脸图片样本的判别结果,D(x)表示判别网络对真实人脸图片样本的判别结果,在一些实施中,判别网络的判别结果包括0和1,判别结果为0表示判别网络认为待判别图片是生成网络生成的(fake),判别结果为1表示判别网络认为待判别图片是真实的(real)。
在一些实施例中,基于融合人脸图片样本、源人脸图片样本、目标人脸图片样本和判别网络的判别结果确定生成网络损失,包括:基于目标人脸图片样本和融合人脸图片样本确定感知相似度损失,感知相似度损失用于表征目标人脸图片样本和融合人脸图片样本之间的图片风格差异;基于源人脸图片样本和融合人脸图片样本确定多尺度身份特征损失,多尺度身份特征损失用于表征源人脸图片样本和融合人脸图片样本之间的身份特征差异;基于目标人脸图片样本和融合人脸图片样本确定人脸姿态损失,人脸姿态损失用于描述目标人脸图片样本与融合人脸图片样本之间的人脸姿态差异;基于判别结果确定生成网络对抗损失;根据感知相似度损失、多尺度身份特征损失、人脸姿态损失和网络对抗损失,确定生成网络损失。
在一些实施例中,基于目标人脸图片样本和融合人脸图片样本确定感知相似度损失,包括:通过视觉特征提取网络,分别提取目标人脸图片样本的视觉特征和融合人脸图片样本的视觉特征;计算目标人脸图片样本的视觉特征和融合人脸图片样本的视觉特征之间的相似度,得到感知相似度损失。
该感知相似度损失可以通过如下公式计算得到:
L LPIPS=||F(x t)-F(y s2t)|| 2
其中,x t表示目标人脸图片样本,y s2t表示融合人脸图片样本,F(x t)是目标人脸图片样本通过视觉特征提取网络提取得到的目标人脸图像样本的视觉特征,F(y s2t)是融合人脸图片样本通过视觉特征提取网络提取得到的融合人脸图像样本的视觉特征。
在一些实施例中,基于源人脸图片样本和融合人脸图片样本确定多尺度身份特征损失,包括:通过身份特征提取网络,分别提取源人脸图片样本的身份特征隐码和融合人脸图片样本的身份特征隐码;计算源人脸图片样本的身份特征隐码和融合人脸图片样本的身份特征隐码之间的相似度,得到多尺度身份特征损失。
该多尺度身份特征损失可以通过如下公式计算得到:
L ID=Σ(1-cos(N i(x s),N i(y s2t)))
其中,x s表示源人脸图片样本,y s2t表示融合人脸图片样本,N(x s)是目标人脸图片样本通过身份特征提取网络提取得到的源人脸图像样本的身份特征,N(y s2t)是融合人脸图片样本通过身份特征提取网络提取得到的融合人脸图像样本的身份特征。在一些实施例中,使用VGG(Visual Geometry Group,超分辨率测试序列)face网络作为身份特征提取网络,分别提取目标人脸图片样本和融合人脸图片样本的身份特征。
在一些实施例中,基于目标人脸图片样本和融合人脸图片样本确定人脸姿态损失,包括: 所述基于所述目标人脸图片样本和所述融合人脸图片样本确定人脸姿态损失,包括:
通过人脸姿态预测网络,分别提取目标人脸图片样本的人脸姿态欧拉角交和融合人脸图片样本的人脸姿态欧拉角;
计算通目标人脸图片样本的人脸姿态欧拉角和融合人脸图片样本的人脸姿态欧拉角之间的相似度,得到人脸姿态损失。
该人脸姿态损失可以通过如下公式计算得到:
L POSE=||E(x t)–E(y s2t)|| 2
其中x t表示目标人脸图片样本,y s2t表示融合人脸图片样本,其E(x t)是目标人脸图片样本通过人脸姿态预测网络提取得到的目标人脸图像样本的人脸姿态欧拉角,E(y s2t)是融合人脸图片样本通过人脸姿态预测网络提取得到的融合人脸图像样本的人脸姿态欧拉角。
在一些实施例中,使用MTCNN(Multi-task Cascaded Convolutional Networks,多任务卷积神经网络)网络作为人脸姿态预测网络,分别提取目标人脸图片样本和融合人脸图片样本的人脸姿态欧拉角。
在一些实施例中,基于判别结果确定生成网络的对抗损失可以通过如下计算公式得到:
L g=-log(exp(D(G(x s)))+1)
其中,G(x s)表示生成网络生成的融合人脸图片样本,D(G(x s))表示判别网络对融合人脸图片样本的判别结果。
在一些实施例中,人脸融合模型的训练过程如下:
1.初始化身份编码网络、属性编码网络和判别网络中的参数;
2.从训练样本集中抽取m组训练样本组,每一组训练样本中包含一个源人脸图片样本和一个目标人脸图片样本;
3.针对每一个训练样本组,分别通过身份编码网络获取源人脸图片样本的身份特征编码,通过属性编码网络获取目人脸图片的属性特征编码,通过解码网络对上述身份特征编码网络进行解码,生成融合人脸图片样本;
4.在生成m个融合人脸图片样本后,固定生成网络,在训练样本集中抽取m个真实图片样本;
5.通过判别网络分别对m个融合人脸图片样本和m个真实图片样本进行判别,并输出判别结果;
6.根据判别网络的判别结果采用逻辑回归损失函数确定判别网络的损失函数,并采用梯度下降的方式优化判别网络中的参数;
7.通过融合人脸图片样本、源人脸图片样本、目标人脸图片样本和判别网络的判别结果确定生成损失函数,根据生成损失函数采用梯度下降的方法对生成网络中的参数进行优化,完成一组训练;
8.在一组训练结束时,通过以下公式计算人脸融合模型的总损失:
L total=W LPIPS*L LPIPIS+W ID*L ID+W POSE*L POSE+W gan*(L g+L d)
其中,W LPIPS、W ID、W POSE和W gan为对应损失在总损失中所占的权重,在一些实施例中,W LPIPS、W ID、W POSE和W gan的取值分别为1、5、5、5。
9.在人脸融合模型的总损失达到最小的情况下,停止训练。
在实际训练过程中,针对一个训练样本集进行16阶段(epoch)可以获得能生成逼真人脸融合图片的人脸融合模型模型。
通过引入感知相似度损失、多尺度身份特征损失、生成对抗损失、人脸姿态损失等多方面的损失,使得训练过程中,人脸融合模型能更好的调整参数。
下述为本申请装置实施例,可以用于执行本申请方法实施例。对于本申请装置实施例中未披露的细节,请参照本申请方法实施例。
请参考图6,其示出了本申请一个实施例提供的人脸图片的融合装置的框图。该装置具有实现上述人脸图片的融合方法的功能,所述功能可以由硬件实现,也可以由硬件执行相应的软件实现。该装置可以是上文介绍的电子设备,也可以设置在电子设备中。该装置600可以包括:人脸图片获取模块610、身份特征获取模块620、属性特征获取模块630、融合图片生成模块640。
人脸图片获取模块610,配置为获取源人脸图片和目标人脸图片。
身份特征获取模块620,配置为获取所述源人脸图片的身份特征隐码,所述身份特征隐码用于表征所述源人脸图片中人物身份特征。
属性特征获取模块630,配置为获取所述目标人脸图片的属性特征隐码,所述属性特征隐码用于表征所述目标人脸图片中人物属性特征。
融合图片生成模块640,配置为基于所述身份特征隐码和所述属性特征隐码进行融合,生成融合人脸图片。
在一些实施例中,所述融合人脸图片由人脸融合模型生成,所述人脸融合模型包括身份编码网络、属性编码网络和解码网络;其中,所述身份编码网络用于获取所述源人脸图片的身份特征隐码;所述属性编码网络用于获取所述目标人脸图片的属性特征隐码;所述解码网络用于基于所述身份特征隐码和所述属性特征隐码进行融合,生成所述融合人脸图片。
在一些实施例中,所述身份编码网络包括N个串联的编码层,N为大于1的整数;所述身份特征获取模块620,配置为:通过所述身份编码网络中的第1个至第n1个编码层,对所述源人脸图片进行编码处理,得到浅层隐码;其中,所述浅层隐码用于表征所述源人脸图片的面部外观特征;通过所述身份编码网络中的第n1个至第n2个编码层,对所述浅层隐码进行编码处理,得到中层隐码;其中,所述中层隐码用于表征所述源人脸图片的精细面部特征;通过所述身份编码网络中的第n2个至第N个编码层,对所述中层隐码进行编码处理,得到深层隐码;其中,所述深层隐码用于表征所述源人脸图片的人脸颜色特征和人脸微观特征;其中,所述身份特征隐码包括:所述浅层隐码、所述中层隐码和所述深层隐码,n1、n2为小于N的正整数。
在一些实施例中,所述融合图片生成模块640包括:控制向量生成单元,配置为对所述身份特征隐码进行仿射变换,生成M组控制向量;融合单元,配置为通过所述M个解码层对所述属性特征隐码和所述M组控制向量进行解码处理,生成所述融合人脸图片;其中,第1个解码层的输入包括所述属性特征隐码和第1组控制向量,第i+1个解码层的输入包括第i个解码层的输出和第i+1组控制向量,第M个解码层的输出包括所述融合人脸图片,i为小于M的正整数。
在一些实施例中,所述融合单元,配置为将所述身份特征隐码划分为M组身份特征向量;对所述M组身份特征向量分别进行仿射变换,生成所述M组控制向量;其中,每组所述控制向量包括至少两个控制向量,不同的控制向量用于表征不同维度的身份特征。
请参考图7,其示出了本申请一个实施例提供的人脸融合模型的训练装置的框图。该装置具有实现上述人脸融合模型的训练方法的功能,所述功能可以由硬件实现,也可以由硬件执行相应的软件实现。该装置可以是上文介绍的分析设备,也可以设置在分析设备中。该装置700可以包括:训练样本获取模块710、身份特征获取模块720、属性特征获取模块730、融合图片生成模块740、人脸图片判别模块750、第一参数调整模块760和第二参数调整模块770。
训练样本获取模块710,配置为获取人脸融合模型的训练样本,所述训练样本包括源人脸图片样本和目标人脸图片样本。
身份特征获取模块720,配置为通过所述身份编码网络获取所述源人脸图片样本的身份特征隐码,所述身份特征隐码是用于表征所述源人脸图片样本中人物身份特征。
属性特征获取模块730,配置为通过所述属性编码网络获取所述目标人脸图片样本的属性特征隐码,所述属性特征隐码用于表征所述目标人脸图片样本中人物属性特征。
融合图片生成模块740,配置为通过所述解码网络基于所述身份特征隐码和所述属性特征隐码进行融合,生成融合人脸图片样本。
人脸图片判别模块750,配置为通过所述判别网络确定待判别样本是否由所述生成网络生成,所述待判别样本包括所述融合人脸图片样本。
第一参数调整模块760,配置为基于所述判别网络的判别结果确定判别网络损失,以及基于所述判别网络损失对所述判别网络中的参数进行调整。
第二参数调整模块770,配置为基于所述融合人脸图片样本、所述源人脸图片样本、所述目标人脸图片样本和所述判别网络的判别结果确定生成网络损失,以及基于所述生成网络损失对所述生成网络中的参数进行调整。
在一些实施例中,所述解码网络包括M个解码层,M为大于1的整数,所述身份特征获取模块720,配置为:通过所述身份编码网络中的第1个至第n1个编码层,对所述源人脸图片样本进行编码处理,得到浅层隐码;其中,所述浅层隐码用于表征所述源人脸图片样本的面部外观特征;通过所述身份编码网络中的第n1个至第n2个编码层,对所述浅层隐码进行编码处理,得到中层隐码;其中,所述中层隐码用于表征所述源人脸图片样本的精细面部特征;通过所述身份编码网络中的第n2个至第N个编码层,对所述中层隐码进行编码处理,得到深层隐码;其中,所述深层隐码用于表征所述源人脸图片样本的人脸颜色特征和人脸微观特征;其中,所述身份特征隐码包括:所述浅层隐码、所述中层隐码和所述深层隐码,n1、n2为小于N的正整数。
在一些实施例中,所述解码网络包括M个解码层,M为大于1的整数,所述样本融合图片生成模块740,配置为:对所述身份特征隐码进行仿射变换,生成M组控制向量;通过所述M个解码层对所述属性特征隐码和所述M组控制向量进行解码处理,生成所述融合人脸图片样本;其中,第1个解码层的输入包括所述属性特征隐码和第1组控制向量,第i+1个解码层的输入包括第i个解码层的输出和第i+1组控制向量,第M个解码层的输出包括所述融合人脸图片样本,i为小于M的正整数。
在一些实施例中,所述第二参数调整模块770,包括:第一损失函数单元,配置为基于所述目标人脸图片样本和所述融合人脸图片样本确定感知相似度损失,所述感知相似度损失用于表征所述目标人脸图片样本和所述融合人脸图片样本之间的图片风格差异;第二损失函数单元,配置为基于所述源人脸图片样本和所述融合人脸图片样本确定所述多尺度身份特征损失,所述多尺度身份特征损失用于表征所述源人脸图片样本和所述融合人脸图片样本之间的身份特征差异;第三损失函数单元,配置为基于所述目标人脸图片样本和所述融合人脸图片样本确定人脸姿态损失,所述人脸姿态损失用于基于所述判别结果确定生成网络对抗损失;根据所述感知相似度损失、所述多尺度身份特征损失、所述人脸姿态损失和所述网络对抗损失,确定所述生成网络损失。
在一些实施例中,所述第一损失函数单元,配置为通过视觉特征提取网络,分别提取所述目标人脸图片样本的视觉特征和所述融合人脸图片样本的视觉特征;计算所述目标人脸图片样本的视觉特征和所述融合人脸图片样本的视觉特征之间的相似度,得到所述感知相似度损失。
在一些实施例中,所述第二损失函数单元,配置为通过所述身份特征提取网络,分别提取所述源人脸图片样本的身份特征隐码和所述融合人脸图片样本的身份特征隐码;计算所述源人脸图片样本的身份特征隐码和所述融合人脸图片样本的身份特征隐码之间的相似度,得到所述多尺度身份特征损失。
在一些实施例中,所述第三损失函数单元,配置为通过人脸姿态预测网络,分别提取所述目标人脸图片样本的人脸姿态欧拉角和所述融合人脸图片样本的人脸姿态欧拉角;计算通 所述目标人脸图片样本的人脸姿态欧拉角和所述融合人脸图片样本的人脸姿态欧拉角之间的相似度,得到所述人脸姿态损失。
需要说明的是,上述实施例提供的装置,在实现其功能时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内容结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的装置与方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
请参考图8,其示出了本申请一个实施例提供的计算机设备800的结构框图。该计算机设备800可以用于实施上述融合人脸的生成方法;也可以用于实施上述人脸融合模型的训练方法。
通常,计算机设备800包括有:处理器801和存储器802。
处理器801可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。在一些实施例中,处理器801可以在集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器801还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器802可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。存储器802还可包括高速随机存取存储器,以及非易失性存储器。
本领域技术人员可以理解,图8中示出的结构并不构成对计算机设备800的限定,可以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。
在示例中实施例中,还提供了一种计算机设备,所述计算机设备包括处理器和存储器,所述存储器中存储有计算机程序。所述计算机程序经配置以由一个或者一个以上处理器执行,以实现上述人脸图片的融合方法,或者实现上述人脸融合模型的训练方法。计算机设备可以称为图像处理设备,用于实现人脸图片融合方法。计算机设备也可以称为模型训练设备,用于实现人脸融合模型的训练方法。
在示例性实施例中,还提供了一种计算机可读存储介质,所述存储介质中存储有计算机程序,所述计算机程序在被计算机设备的处理器执行时实现上述人脸图片的融合方法,或者实现上述人脸融合模型的训练方法。
可选地,上述计算机可读存储介质可以是ROM(Read-Only Memory,只读存储器)、RAM(Random Access Memory,随机存取存储器)等。
在示例性实施例中,还提供了一种计算机程序产品,当所述计算机程序产品在计算机设备上运行时,使得计算机设备执行如上述人脸图片的融合方法,或者上述人脸融合模型的训练方法。
应当理解的是,在本文中提及的“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。另外,本文中描述的步骤编号,仅示例性示出了步骤间的一种可能的执行先后顺序,在一些其它实施例中,上述步骤也可以不按照编号顺序来执行,如两个不同编号的步骤同时执行,或者两个不同编号的步骤按照与图示相反的顺序执行,本申请实施例对此不作限定。

Claims (17)

  1. 一种人脸图片的融合方法,所述方法由计算机设备执行,所述方法包括:
    获取源人脸图片和目标人脸图片;
    获取所述源人脸图片的身份特征隐码,所述身份特征隐码用于表征所述源人脸图片中人物身份特征;
    获取所述目标人脸图片的属性特征隐码,所述属性特征隐码用于表征所述目标人脸图片中人物属性特征;
    基于所述身份特征隐码和所述属性特征隐码进行融合,生成融合人脸图片。
  2. 根据权利要求1所述的方法,其中,所述融合人脸图片由人脸融合模型生成,所述人脸融合模型包括身份编码网络、属性编码网络和解码网络;其中,
    所述身份编码网络用于获取所述源人脸图片的身份特征隐码;
    所述属性编码网络用于获取所述目标人脸图片的属性特征隐码;
    所述解码网络用于基于所述身份特征隐码和所述属性特征隐码进行融合,生成所述融合人脸图片。
  3. 根据权利要求2所述的方法,其中,所述身份编码网络包括N个串联的编码层,N为大于1的整数;所述获取所述源人脸图片的身份特征隐码,包括:
    通过所述身份编码网络中的第1个至第n1个编码层,对所述源人脸图片进行编码处理,得到浅层隐码;其中,所述浅层隐码用于表征所述源人脸图片的面部外观特征;
    通过所述身份编码网络中的第n1个至第n2个编码层,对所述浅层隐码进行编码处理,得到中层隐码;其中,所述中层隐码用于表征所述源人脸图片的精细面部特征;
    通过所述身份编码网络中的第n2个至第N个编码层,对所述中层隐码进行编码处理,得到深层隐码;其中,所述深层隐码用于表征所述源人脸图片的人脸颜色特征和人脸微观特征;
    其中,所述身份特征隐码包括:所述浅层隐码、所述中层隐码和所述深层隐码,n1、n2为小于N的正整数。
  4. 根据权利要求2所述的方法,其中,所述解码网络包括M个解码层,M为大于1的整数;所述基于所述身份特征隐码和所述属性特征隐码进行融合,生成融合人脸图片,包括:
    对所述身份特征隐码进行仿射变换,生成M组控制向量;
    通过所述M个解码层对所述属性特征隐码和所述M组控制向量进行解码处理,生成所述融合人脸图片;
    其中,第1个解码层的输入包括所述属性特征隐码和第1组控制向量,第i+1个解码层的输入包括第i个解码层的输出和第i+1组控制向量,第M个解码层的输出包括所述融合人脸图片,i为小于M的正整数。
  5. 根据权利要求4所述的方法,其中,所述对所述身份特征隐码进行仿射变换,生成M组控制向量,包括:
    将所述身份特征隐码划分为M组身份特征向量;
    对所述M组身份特征向量分别进行仿射变换,生成所述M组控制向量;
    其中,每组所述控制向量包括至少两个控制向量,不同的控制向量用于表征不同维度的身份特征。
  6. 一种人脸融合模型的训练方法,所述方法由计算机设备执行,所述人脸融合模型包括生成网络和判别网络,所述生成网络包括身份编码网络、属性编码网络和解码网络;所述方法包括:
    获取人脸融合模型的训练样本,所述训练样本包括源人脸图片样本和目标人脸图片样本;
    通过所述身份编码网络获取所述源人脸图片样本的身份特征隐码,所述身份特征隐码是用于表征所述源人脸图片样本中人物身份特征;
    通过所述属性编码网络获取所述目标人脸图片样本的属性特征隐码,所述属性特征隐码用于表征所述目标人脸图片样本中人物属性特征;
    通过所述解码网络基于所述身份特征隐码和所述属性特征隐码进行融合,生成融合人脸图片样本;
    通过所述判别网络确定待判别样本是否由所述生成网络生成,所述待判别样本包括所述融合人脸图片样本;
    基于所述判别网络的判别结果确定判别网络损失,以及基于所述判别网络损失对所述判别网络中的参数进行调整;
    基于所述融合人脸图片样本、所述源人脸图片样本、所述目标人脸图片样本和所述判别网络的判别结果确定生成网络损失,以及基于所述生成网络损失对所述生成网络中的参数进行调整。
  7. 根据权利要求6所述的方法,其中,所述身份编码网络包括N个串联的编码层,N为大于1的整数;所述通过所述身份编码网络获取所述源人脸图片样本的身份特征隐码,包括:
    通过所述身份编码网络中的第1个至第n1个编码层,对所述源人脸图片样本进行编码处理,得到浅层隐码;其中,所述浅层隐码用于表征所述源人脸图片样本的面部外观特征;
    通过所述身份编码网络中的第n1个至第n2个编码层,对所述浅层隐码进行编码处理,得到中层隐码;其中,所述中层隐码用于表征所述源人脸图片样本的精细面部特征;
    通过所述身份编码网络中的第n2个至第N个编码层,对所述中层隐码进行编码处理,得到深层隐码;其中,所述深层隐码用于表征所述源人脸图片样本的人脸颜色特征和人脸微观特征;
    其中,所述身份特征隐码包括:所述浅层隐码、所述中层隐码和所述深层隐码,n1、n2为小于N的正整数。
  8. 根据权利要求6所述的方法,其中,所述解码网络包括M个解码层,M为大于1的整数;所述通过所述解码网络基于所述身份特征隐码和所述属性特征隐码进行融合,生成融合人脸图片样本,包括:
    对所述身份特征隐码进行仿射变换,生成M组控制向量;
    通过所述M个解码层对所述属性特征隐码和所述M组控制向量进行解码处理,生成所述融合人脸图片样本;
    其中,第1个解码层的输入包括所述属性特征隐码和第1组控制向量,第i+1个解码层的输入包括第i个解码层的输出和第i+1组控制向量,第M个解码层的输出包括所述融合人脸图片样本,i为小于M的正整数。
  9. 根据权利要求6所述的方法,其中,所述基于所述融合人脸图片样本、所述源人脸图片样本、所述目标人脸图片样本和所述判别网络的判别结果确定生成网络损失,包括:
    基于所述目标人脸图片样本和所述融合人脸图片样本确定感知相似度损失,所述感知相似度损失用于表征所述目标人脸图片样本和所述融合人脸图片样本之间的图片风格差异;
    基于所述源人脸图片样本和所述融合人脸图片样本确定所述多尺度身份特征损失,所述多尺度身份特征损失用于表征所述源人脸图片样本和所述融合人脸图片样本之间的身份特征 差异;
    基于所述目标人脸图片样本和所述融合人脸图片样本确定人脸姿态损失,所述人脸姿态损失用于描述所述目标人脸图片样本与所述融合人脸图片样本之间的人脸姿态差异;
    基于所述判别结果确定生成网络对抗损失;
    根据所述感知相似度损失、所述多尺度身份特征损失、所述人脸姿态损失和所述网络对抗损失,确定所述生成网络损失。
  10. 根据权利要求9所述的方法,其中,所述基于所述目标人脸图片样本和所述融合人脸图片样本确定感知相似度损失,包括:
    通过视觉特征提取网络,分别提取所述目标人脸图片样本的视觉特征和所述融合人脸图片样本的视觉特征;
    计算所述目标人脸图片样本的视觉特征和所述融合人脸图片样本的视觉特征之间的相似度,得到所述感知相似度损失。
  11. 根据权利要求9所述的方法,其中,所述基于所述源人脸图片样本和所述融合人脸图片样本确定所述多尺度身份特征损失,包括:
    通过所述身份特征提取网络,分别提取所述源人脸图片样本的身份特征隐码和所述融合人脸图片样本的身份特征隐码;
    计算所述源人脸图片样本的身份特征隐码和所述融合人脸图片样本的身份特征隐码之间的相似度,得到所述多尺度身份特征损失。
  12. 根据权利要求9所述的方法,其中,所述基于所述目标人脸图片样本和所述融合人脸图片样本确定人脸姿态损失,包括:
    通过人脸姿态预测网络,分别提取所述目标人脸图片样本的人脸姿态欧拉角和所述融合人脸图片样本的人脸姿态欧拉角;
    计算所述目标人脸图片样本的人脸姿态欧拉角和所述融合人脸图片样本的人脸姿态欧拉角之间的相似度,得到所述人脸姿态损失。
  13. 一种人脸图片的融合装置,所述装置包括:
    人脸图片获取模块,配置为获取源人脸图片和目标人脸图片;
    身份特征获取模块,配置为获取所述源人脸图片的身份特征隐码,所述身份特征隐码用于表征所述源人脸图片中人物身份特征;
    属性特征获取模块,配置为获取所述目标人脸图片的属性特征隐码,所述属性特征隐码用于表征所述目标人脸图片中人物属性特征;
    融合图片生成模块,配置为基于所述身份特征隐码和所述属性特征隐码进行融合,生成融合人脸图片。
  14. 一种人脸融合模型的训练装置,所述人脸融合模型包括生成网络和判别网络,所述生成网络包括身份编码网络、属性编码网络和解码网络;所述装置包括:
    训练样本获取模块,配置为获取人脸融合模型的训练样本,所述训练样本包括源人脸图片样本和目标人脸图片样本;
    身份特征获取模块,配置为通过所述身份编码网络获取所述源人脸图片样本的身份特征隐码,所述身份特征隐码是用于表征所述源人脸图片样本中人物身份特征;
    属性特征获取模块,配置为通过所述属性编码网络获取所述目标人脸图片样本的属性特征隐码,所述属性特征隐码用于表征所述目标人脸图片样本中人物属性特征;
    融合图片生成模块,配置为通过所述解码网络基于所述身份特征隐码和所述属性特征隐 码进行融合,生成融合人脸图片样本;
    人脸图片判别模块,配置为通过所述判别网络确定待判别样本是否由所述生成网络生成,所述待判别样本包括所述融合人脸图片样本;
    第一参数调整模块,配置为基于所述判别网络的判别结果确定判别网络损失,以及基于所述判别网络损失对所述判别网络中的参数进行调整;
    第二参数调整模块,配置为基于所述融合人脸图片样本、所述源人脸图片样本、所述目标人脸图片样本和所述判别网络的判别结果确定生成网络损失,以及基于所述生成网络损失对所述生成网络中的参数进行调整。
  15. 一种计算机设备,所述计算机设备包括处理器和存储器,所述存储器中存储有计算机程序,所述计算机程序由所述处理器加载并执行以实现如权利要1至5任一项所述的方法,或实现如权利要求6至12任一项所述的方法。
  16. 一种计算机可读存储介质,所述存储介质中存储有计算机程序,所述计算机程序由处理器加载并执行以实现如权利要求1至5任一项所述的方法,或实现如权利要求6至12任一项所述的方法。
  17. 一种计算机程序产品,当计算机程序产品在计算机设备上运行时,使得计算机设备执行如权利要求1至5任一项所述的方法,或实现如权利要求6至12任一项所述的方法。
PCT/CN2022/116786 2021-09-16 2022-09-02 人脸图片的融合方法、装置、设备及存储介质 WO2023040679A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111089159.1A CN113850168A (zh) 2021-09-16 2021-09-16 人脸图片的融合方法、装置、设备及存储介质
CN202111089159.1 2021-09-16

Publications (1)

Publication Number Publication Date
WO2023040679A1 true WO2023040679A1 (zh) 2023-03-23

Family

ID=78974417

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/116786 WO2023040679A1 (zh) 2021-09-16 2022-09-02 人脸图片的融合方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN113850168A (zh)
WO (1) WO2023040679A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116310657A (zh) * 2023-05-12 2023-06-23 北京百度网讯科技有限公司 特征点检测模型训练方法、图像特征匹配方法及装置
CN117993480A (zh) * 2024-04-02 2024-05-07 湖南大学 面向设计师风格融合和隐私保护的aigc联邦学习方法

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113850168A (zh) * 2021-09-16 2021-12-28 百果园技术(新加坡)有限公司 人脸图片的融合方法、装置、设备及存储介质
CN114418919B (zh) * 2022-03-25 2022-07-26 北京大甜绵白糖科技有限公司 图像融合方法及装置、电子设备和存储介质
KR20230141429A (ko) 2022-03-30 2023-10-10 텐센트 테크놀로지(센젠) 컴퍼니 리미티드 이미지 프로세싱 방법 및 장치, 컴퓨터 디바이스, 컴퓨터-판독가능 저장 매체, 및 컴퓨터 프로그램 제품
CN115278297B (zh) * 2022-06-14 2023-11-28 北京达佳互联信息技术有限公司 基于驱动视频的数据处理方法、装置、设备及存储介质
CN114845067B (zh) * 2022-07-04 2022-11-04 中科计算技术创新研究院 基于隐空间解耦的人脸编辑的深度视频传播方法
CN116246022B (zh) * 2023-03-09 2024-01-26 山东省人工智能研究院 一种基于渐进式去噪引导的人脸图像身份合成方法

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111339420A (zh) * 2020-02-28 2020-06-26 北京市商汤科技开发有限公司 图像处理方法、装置、电子设备及存储介质
CN111860167A (zh) * 2020-06-18 2020-10-30 北京百度网讯科技有限公司 人脸融合模型获取及人脸融合方法、装置及存储介质
US20210064857A1 (en) * 2018-05-17 2021-03-04 Mitsubishi Electric Corporation Image analysis device, image analysis method, and recording medium
CN112560753A (zh) * 2020-12-23 2021-03-26 平安银行股份有限公司 基于特征融合的人脸识别方法、装置、设备及存储介质
CN112766160A (zh) * 2021-01-20 2021-05-07 西安电子科技大学 基于多级属性编码器和注意力机制的人脸替换方法
CN113343878A (zh) * 2021-06-18 2021-09-03 北京邮电大学 基于生成对抗网络的高保真人脸隐私保护方法和系统
CN113850168A (zh) * 2021-09-16 2021-12-28 百果园技术(新加坡)有限公司 人脸图片的融合方法、装置、设备及存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210064857A1 (en) * 2018-05-17 2021-03-04 Mitsubishi Electric Corporation Image analysis device, image analysis method, and recording medium
CN111339420A (zh) * 2020-02-28 2020-06-26 北京市商汤科技开发有限公司 图像处理方法、装置、电子设备及存储介质
CN111860167A (zh) * 2020-06-18 2020-10-30 北京百度网讯科技有限公司 人脸融合模型获取及人脸融合方法、装置及存储介质
CN112560753A (zh) * 2020-12-23 2021-03-26 平安银行股份有限公司 基于特征融合的人脸识别方法、装置、设备及存储介质
CN112766160A (zh) * 2021-01-20 2021-05-07 西安电子科技大学 基于多级属性编码器和注意力机制的人脸替换方法
CN113343878A (zh) * 2021-06-18 2021-09-03 北京邮电大学 基于生成对抗网络的高保真人脸隐私保护方法和系统
CN113850168A (zh) * 2021-09-16 2021-12-28 百果园技术(新加坡)有限公司 人脸图片的融合方法、装置、设备及存储介质

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116310657A (zh) * 2023-05-12 2023-06-23 北京百度网讯科技有限公司 特征点检测模型训练方法、图像特征匹配方法及装置
CN116310657B (zh) * 2023-05-12 2023-09-01 北京百度网讯科技有限公司 特征点检测模型训练方法、图像特征匹配方法及装置
CN117993480A (zh) * 2024-04-02 2024-05-07 湖南大学 面向设计师风格融合和隐私保护的aigc联邦学习方法

Also Published As

Publication number Publication date
CN113850168A (zh) 2021-12-28

Similar Documents

Publication Publication Date Title
WO2023040679A1 (zh) 人脸图片的融合方法、装置、设备及存储介质
Deng et al. Uv-gan: Adversarial facial uv map completion for pose-invariant face recognition
CN113569791B (zh) 图像处理方法及装置、处理器、电子设备及存储介质
Baldassarre et al. Deep koalarization: Image colorization using cnns and inception-resnet-v2
CN111489287B (zh) 图像转换方法、装置、计算机设备和存储介质
CN111754596B (zh) 编辑模型生成、人脸图像编辑方法、装置、设备及介质
WO2022156640A1 (zh) 一种图像的视线矫正方法、装置、电子设备、计算机可读存储介质及计算机程序产品
WO2021052375A1 (zh) 目标图像生成方法、装置、服务器及存储介质
WO2020103700A1 (zh) 一种基于微表情的图像识别方法、装置以及相关设备
CN111444881A (zh) 伪造人脸视频检测方法和装置
CN111553267B (zh) 图像处理方法、图像处理模型训练方法及设备
CN110084193B (zh) 用于面部图像生成的数据处理方法、设备和介质
CN106650617A (zh) 一种基于概率潜在语义分析的行人异常识别方法
CN111833360B (zh) 一种图像处理方法、装置、设备以及计算机可读存储介质
JP2016085579A (ja) 対話装置のための画像処理装置及び方法、並びに対話装置
WO2022188697A1 (zh) 提取生物特征的方法、装置、设备、介质及程序产品
CN115565238B (zh) 换脸模型的训练方法、装置、设备、存储介质和程序产品
WO2024051480A1 (zh) 图像处理方法、装置及计算机设备、存储介质
CN114973349A (zh) 面部图像处理方法和面部图像处理模型的训练方法
CN113298018A (zh) 基于光流场和脸部肌肉运动的假脸视频检测方法及装置
CN116740261A (zh) 图像重建方法和装置、图像重建模型的训练方法和装置
CN111080754B (zh) 一种头部肢体特征点连线的人物动画制作方法及装置
Paterson et al. 3D head tracking using non-linear optimization.
CN111325252A (zh) 图像处理方法、装置、设备、介质
CN116229528A (zh) 一种活体掌静脉检测方法、装置、设备及存储介质

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE