CN114495190A

CN114495190A - Training method of face changing network model, image face changing method and related equipment

Info

Publication number: CN114495190A
Application number: CN202110885689.0A
Authority: CN
Inventors: 陈圣; 蒋宁; 王洪斌; 周迅溢; 吴海英; 曾定衡
Original assignee: Mashang Xiaofei Finance Co Ltd
Current assignee: Mashang Xiaofei Finance Co Ltd
Priority date: 2021-08-03
Filing date: 2021-08-03
Publication date: 2022-05-13

Abstract

The application provides a training method of a face changing network model, an image face changing method and related equipment, wherein a plurality of image sample pairs can be obtained when the face changing network model is trained; according to the method, an original image sample and a target image sample which are included by an image sample pair are input into an initial face changing network model, face features are extracted from the target image sample for multiple times, attribute features are extracted from the original image sample for multiple times, then the face features and the attribute features which are extracted for multiple times are fused to obtain a face changing image sample, the face changing image sample can contain more face features and attribute features, and therefore network parameters of the initial face changing network model are updated by combining each image sample pair and the face changing image sample which contains more face features and attribute features and corresponds to each image sample pair, the training of the face changing network model is facilitated, the final face changing network model can better perform face changing operation, a real natural face changing image is obtained, and the face changing effect of the face changing image is effectively improved.

Description

Training method of face changing network model, image face changing method and related equipment

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a training method for a face changing network model, an image face changing method, and a related device.

Background

Face changing is widely used in the fields of content creation, movie production, and entertainment video production. The face changing means that an original image and a target image are given, identity features in the target image are transferred to the original image, a face changing image is obtained, and the obtained face changing image not only keeps the identity features of the target image, but also has attribute features of the original image, such as posture, expression, illumination, background and the like.

In the prior art, when an image is changed, the image is changed based on a face key point, and the specific process is as follows: the method comprises the steps of firstly obtaining key points of the face of an original image and key points of the face of a target image, extracting the face area of the original image through the key points of the original image, and fusing the face area of the original image into the target image according to the key points of the target image, so that image face changing is completed.

However, it is difficult to acquire a real and natural face-changing image in this way, so that the face-changing effect of the face-changing image is poor.

Disclosure of Invention

The embodiment of the application provides a training method of a face changing network model, an image face changing method and related equipment, so that a real and natural face changing image can be obtained, and the face changing effect of the face changing image is improved.

In a first aspect, an embodiment of the present application provides a method for training a face-changing network model, where the method for training the face-changing network model may include:

acquiring a plurality of image sample pairs, wherein each image sample pair comprises an original image sample and a target image sample corresponding to the original image sample;

for each image sample, inputting an original image sample and a target image sample in the image sample pair into an initial face changing network model, extracting face features from the target image sample for multiple times, extracting attribute features from the original image sample for multiple times, and fusing the extracted face features and the extracted attribute features for multiple times to obtain a face changing image sample corresponding to the image sample pair;

and updating the network parameters of the initial face-changing network model according to the image sample pairs and the face-changing image samples corresponding to the image sample pairs.

It can be seen that when the face-changing network model is trained, a plurality of image sample pairs can be obtained first; according to the method, for each image sample, an original image sample and a target image sample which are included by the image sample pair are input into an initial face changing network model, face features are extracted from the target image sample for multiple times, attribute features are extracted from the original image sample for multiple times, and then the face features and the attribute features which are extracted for multiple times are fused to obtain a face changing image sample, so that the face changing image sample can contain more face features and attribute features.

In a second aspect, an embodiment of the present application provides an image face changing method, where the image face changing method may include:

acquiring an original image and a target image comprising a target face;

and inputting the original image and the target image into a face changing network model, extracting human face features from the target image for multiple times, extracting attribute features from the original image for multiple times, and fusing the human face features extracted for multiple times and the attribute features extracted for multiple times to obtain a face changing image.

It can be seen that, when face changing operation is executed, an original image and a target image including a target face may be obtained first, the original image and the target image are input into a face changing network model, face features are extracted from the target image for multiple times, attribute features are extracted from the original image for multiple times, and the face features extracted for multiple times and the attribute features extracted for multiple times are fused to obtain a face changing image. Therefore, the face changing image is obtained by fusing the face features extracted for many times and the attribute features extracted for many times, so that the face changing image can contain more face features and attribute features, a real and natural face changing image can be obtained, and the face changing effect of the face changing image is effectively improved.

In a third aspect, an embodiment of the present application provides a training apparatus for a face-changing network model, where the training apparatus for a face-changing network model may include:

the image processing device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a plurality of image sample pairs, and each image sample pair comprises an original image sample and a target image sample corresponding to the original image sample;

the processing unit is used for inputting an original image sample and a target image sample in the image sample pair into an initial face changing network model, extracting human face features from the target image sample for multiple times, extracting attribute features from the original image sample for multiple times, and fusing the human face features extracted for multiple times and the attribute features extracted for multiple times to obtain a face changing image sample corresponding to the image sample pair;

and the training unit is used for updating the network parameters of the initial face changing network model according to the image sample pairs and the face changing image samples corresponding to the image sample pairs.

In a fourth aspect, an embodiment of the present application provides an image face changing device, which may include:

an acquisition unit configured to acquire an original image and a target image including a target face;

and the processing unit is used for inputting the original image and the target image into a face changing network model, extracting human face features from the target image for multiple times, extracting attribute features from the original image for multiple times, and fusing the human face features extracted for multiple times and the attribute features extracted for multiple times to obtain a face changing image.

In a fifth aspect, an embodiment of the present application provides an electronic device, which may include: a processor and a memory;

a memory; for storing a computer program.

The processor is configured to read a computer program stored in the memory, and execute the face-changing network model training method according to the first aspect, or execute the image face-changing method according to the second aspect.

In a sixth aspect, the present application further provides a readable storage medium, in which computer program is stored computer executable instructions, when executed by a processor, for implementing a training method for a face-changing network model as described in the first aspect above, or for implementing an image face-changing method as described in the second aspect above.

In a seventh aspect, an embodiment of the present application further provides a computer program product, where the computer program product includes a computer program, and when executed, the computer program implements the method for training a face-changing network model according to the first aspect; alternatively, the computer program when executed implements an image facelining method as described above in the second aspect.

Drawings

Fig. 1 is a schematic diagram of an implementation environment of a training method for a face-changing network model according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a training method of a face-changing network model according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of an initial face-changing network model according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of another initial face-changing network model provided in an embodiment of the present application;

fig. 5 is a schematic processing diagram of a Mix Layer according to an embodiment of the present disclosure;

fig. 6 is a schematic flowchart of an image face changing method according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a training apparatus for a face-changing network model according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an image face changing device according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

With the foregoing drawings in mind, certain embodiments of the disclosure have been shown and described in more detail below. These drawings and written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the concepts of the disclosure to those skilled in the art by reference to specific embodiments.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the above-described drawings (if any) are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of including other sequential examples in addition to those illustrated or described. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The technical scheme provided by the embodiment of the application can be applied to scenes such as content generation, movie production, entertainment video production and the like. The face changing means that an original image and a target image are given, identity features in the target image are transferred to the original image, a face changing image is obtained, and the obtained face changing image not only keeps the identity features of the target image, but also has attribute features of the original image, such as posture, expression, illumination, background and the like.

In the prior art, when face changing operation is performed on an image, image face changing is usually realized in a mode based on key points of the face, but it is difficult to acquire a real and natural face changing image by adopting the mode, so that the face changing effect of the face changing image is poor.

In order to obtain a real and natural face-changing image and improve the face-changing effect of the face-changing image, in the embodiment of the application, a plurality of image sample pairs are obtained in advance; according to the method, for each image sample, an original image sample and a target image sample which are included by the image sample pair are input into an initial face changing network model, face features are extracted from the target image sample for multiple times, attribute features are extracted from the original image sample for multiple times, and then the face features and the attribute features which are extracted for multiple times are fused to obtain a face changing image sample, so that the face changing image sample can contain more face features and attribute features.

Fig. 1 is a schematic diagram of an implementation environment of a training method for a face-changing network model provided in an embodiment of the present application, and as shown in fig. 1, the implementation environment provided in this embodiment mainly includes: the terminal device 102 communicates with the server 101 in a wireless mode or a wired mode. The wired mode may be that data transmission is performed between the terminal device 102 and the server 101 through a High Definition Multimedia Interface (HDMI) or other lines, and the wireless mode may be that communication is performed between the terminal device 102 and the server 101 through bluetooth, WIFI or other modes.

In addition, the implementation environment of this embodiment may further include a database 103, and the training audio data set is stored in the database 103. In one implementation, as shown in fig. 1, the server 101 may obtain a plurality of image sample pairs for training a face-changing network model from the database 103, and then perform model training according to the obtained plurality of image sample pairs, thereby obtaining the face-changing network model. After the face-changing network model is trained, the face-changing network model may be deployed in the terminal device 102, and the terminal device 102 may execute face-changing operation according to the face-changing network model, so as to implement face-changing operation of the terminal device 102 or an application program deployed in the terminal device 102.

In another implementation manner, the terminal device 102 may also directly obtain a plurality of image sample pairs for training the face-changing network model from the database 103, and then perform model training according to the obtained plurality of image sample pairs, thereby obtaining the face-changing network model. After the face-changing network model is trained, the terminal device 102 may execute a face-changing operation according to the face-changing network model, so as to implement the face-changing operation of the terminal device 102 or an application program deployed in the terminal device 102.

It should be noted that the terminal device 102 may be, but is not limited to, a smart interactive device such as a smart phone, a tablet, a personal computer, a smart appliance (for example, a water heater, a washing machine, a television, a smart speaker, etc.), a smart wearable device, and the like.

In addition, the server 101 may be an independently deployed server or may be a cluster server.

It should be noted that the method provided by the present application may be widely applied to different application scenarios related to face changing operations, and the following describes in detail the implementation processes of the training method of the face changing network model and the image face changing method provided by the present application in combination with specific application scenarios. It is to be understood that the following detailed description may be combined with other embodiments, and that the same or similar concepts or processes may not be repeated in some embodiments.

Fig. 2 is a flowchart illustrating a method for training a face-changing network model according to an embodiment of the present disclosure, where the method for training the face-changing network model may be executed by software and/or a hardware device, and the hardware device may be the server 101 or the terminal device 102. For example, referring to fig. 2, the training method of the face-changing network model may include:

s201, obtaining a plurality of image sample pairs, wherein each image sample pair comprises an original image sample and a target image sample corresponding to the original image sample.

It is understood that the original image sample may be understood as an image sample for providing attribute information when performing a face changing operation; for example, the attribute information may include gestures, expressions, lighting, and background. The target image sample may be understood as an image sample for providing face information when performing a face change operation. The face-changing image generated based on the original image sample and the target image sample comprises the face information of the target image sample and the attribute information of the original image sample.

The manner of acquiring the plurality of image sample pairs is as follows: a plurality of pre-stored image sample pairs can be directly obtained from a database; the image sample pairs may also be obtained from a third-party training system, and of course, the image sample pairs may also be obtained in other manners, which may be specifically set according to actual needs.

S202, inputting an original image sample and a target image sample in the image sample pair into an initial face changing network model, extracting face features from the target image sample for multiple times, extracting attribute features from the original image sample for multiple times, and fusing the extracted face features and the extracted attribute features for multiple times to obtain a face changing image sample corresponding to the image sample pair.

For example, the initial face-changing network model may include an attribute feature extraction model for extracting attribute features, a face feature extraction model for extracting face features, and a feature fusion layer for fusing the attribute features and the face features, as shown in fig. 3, fig. 3 is a schematic structural diagram of the initial face-changing network model provided in the embodiment of the present application, and an output of the feature fusion layer is a face-changing image sample.

Referring to fig. 3, when the face-changed image samples corresponding to the image sample pairs are obtained, in view of the similarity of the methods of the face-changed image samples corresponding to the image sample pairs, for avoiding redundancy, how to obtain the face-changed image samples corresponding to the image sample pairs will be described by taking the example of obtaining the face-changed image sample corresponding to any one of the image sample pairs.

For example, when a face change image sample corresponding to an image sample pair is obtained, the image sample pair target image sample may be input into a face feature extraction model, face features may be extracted from the target image sample for multiple times, and multiple face features may be output; inputting an original image sample into an attribute feature extraction model, extracting attribute features from the original image sample for multiple times, and outputting multiple attribute features; and inputting the attribute features and the face features into a feature fusion model for fusion to obtain a face-changed image sample corresponding to the image sample.

For example, when face features are extracted from a target image sample for multiple times, as shown in fig. 4, fig. 4 is a schematic structural diagram of another initial face-changing network model provided in this embodiment of the present application, where the face feature extraction model includes a first downsampling network and a first upsampling network, where the first downsampling network includes a plurality of first convolutional layers connected in sequence, the first upsampling network includes a plurality of second convolutional layers connected in sequence, and face features in the target image sample can be extracted through a last first convolutional layer in the first downsampling network in the face feature extraction model, which is exemplarily denoted as Id 1; it can be understood that the face features extracted by the last first convolution layer are the face features that the target image sample sequentially passes through the first convolution layer in the first downsampling network and is output through the last first convolution layer; and extracting the face features in the target image sample by using each second convolution layer in the network in the first, as shown in fig. 4, which may be denoted as Id2, Id3, Id4, and Id5, the extracted face features are different each time, and in general, the face features proposed later can represent the face features in the target image sample, so that the face features are extracted from the target image sample multiple times by the face feature extraction model, and the extracted face features multiple times are Id1, Id2, Id3, Id4, and Id5, so that the face change image sample for training the face change network model is obtained by fusing the extracted face features multiple times, so that the face change image sample can contain more face features, which is more helpful for training the face change network model, so that the trained face change network model can better perform face change operation, and can obtain a real and natural face change image, thereby effectively improving the face changing effect of the face changing image.

It can be understood that, in the embodiment of the present application, the face feature extraction model is a pre-trained network model, and is not trained in the whole face-changing network model, and when the face-changing network model is trained, an attribute feature extraction model is trained together.

When extracting attribute features from the original image sample for a plurality of times, as shown in fig. 4, the attribute feature extraction model may include a second downsampling network and a second upsampling network, the second downsampling network includes a plurality of third convolutional layers, the second upsampling network includes a plurality of fourth convolutional layers, and the attribute features in the original image sample may be extracted through a last third convolutional layer in the second downsampling network, which may be denoted as Xt1, for example, it may be understood that the attribute features extracted by the last third convolutional layer are the attribute features that the original image sample sequentially passes through the third convolutional layers in the second downsampling network and is output through the last third convolutional layer; and extracting attribute features in the original image sample through each fourth convolution layer in the second upsampling network, which may be denoted as Xt2, Xt3, Xt4, and Xt5 as shown in fig. 4, where the attribute features extracted each time are different, and normally, the attribute features proposed later can represent the attribute features in the original image sample, so that the attribute features are extracted from the original image sample multiple times through the attribute feature extraction model, and the attribute features extracted multiple times are Xt1, Xt2, Xt3, Xt4, and Xt5, respectively, so that the face change image sample for training the face change network model is obtained through fusion of the attribute features extracted multiple times, so that the face change image sample can contain more attribute features, which is more helpful for training the face change network model, so that the trained face change network model can better perform face change operation, and can obtain a real and natural face change image, thereby effectively improving the face changing effect of the face changing image.

For example, in this embodiment of the application, a first convolution layer in a first downsampling network in the face feature extraction model, a third convolution layer in a second downsampling network in the attribute feature extraction model, a step size may be 2, a size of a convolution kernel may be 64, and an active layer (Prelu layer) is after the convolution layers, a second convolution layer in a first upsampling network in the face feature extraction model, and a fourth convolution layer in the second downsampling network in the attribute feature extraction model may be 3 × 3 convolution, a step size may be 1, a size of the convolution kernel may be 64, and the active layer (Prelu layer) is after the convolution layers, and then an inverse convolution layer is performed on the basis of the convolution layers. Where a step size of 2 means a step size of 2 for each move, and the resolution is reduced, which means down-sampling. The step length is 1, only the convolution layer of the feature extraction is simply represented, and the attribute feature or the face feature extracted from the previous layer is further extracted.

In the embodiment of the present application, the first convolutional layer, the second convolutional layer, the third convolutional layer and the fourth convolutional layer are only used for distinguishing different convolutional layers, where the first convolutional layer is a convolutional layer in a first downsampling network in the face feature extraction model, the second convolutional layer is a convolutional layer in a first upsampling network in the face feature extraction model, the third convolutional layer is a convolutional layer in a second downsampling network in the attribute feature extraction model, and the fourth convolutional layer is a convolutional layer in a second upsampling network in the attribute feature extraction model.

After the attribute features and the face features are extracted for multiple times respectively, the attribute features and the face features extracted for multiple times can be fused through a feature fusion model in an initial face changing network model to obtain a face changing image sample corresponding to the image sample. For example, the feature fusion model may be an AAD model, which may consist of 5 AAD ResBlk; wherein, AAD ResBlk is formed by stacking a plurality of AAD + Relu + Conv.

Illustratively, when the feature fusion model in the initial face-changing network model is fused, the feature fusion model includes a first feature fusion layer and a plurality of second feature fusion layers that are sequentially connected, the plurality of second feature fusion layers respectively correspond to the plurality of second convolution layers and the plurality of fourth convolution layers one to one, and the face features extracted from the last first convolution layer and the attribute features extracted from the last third convolution layer can be fused by the first feature fusion layer to obtain a fused image; and for each second feature fusion layer, fusing the face features extracted from the corresponding second convolution layer, the attribute features extracted from the corresponding fourth convolution layer and the fused image fused by the previous feature fusion layer, and determining the fused image fused by the last second feature fusion layer as the face change image sample corresponding to the image sample pair.

As shown in fig. 4, the face feature Id1 extracted from the last first convolution layer and the attribute feature Xt1 extracted from the last third convolution layer may be used as inputs of the first feature fusion layer, and the face feature Id1 extracted from the last first convolution layer and the attribute feature Xt1 extracted from the last third convolution layer are fused by the first feature fusion layer to obtain a fused image, and it should be noted that, when the face feature Id1 extracted from the last first convolution layer is input into the first feature fusion layer for fusion, the face feature Id1 extracted from the last first convolution layer needs to be input twice, because: the residual model in the first feature fusion layer needs to do the residual between the input and the output, so an extra copy of the face feature Id1 needs to be input. Then, the face feature Id2 extracted from the first second convolution layer in the first up-sampling, the attribute feature Xt2 extracted from the first fourth convolution layer in the second up-sampling and the fused image obtained by fusing the first feature fusion layer are used as the input of the first second feature fusion layer, and the face feature Id2 extracted from the first second convolution layer in the first up-sampling, the attribute feature Xt2 extracted from the first fourth convolution layer in the second up-sampling and the fused image obtained by fusing the first feature fusion layer are fused through the first second feature fusion layer to obtain a new fused image; then, the face feature Id3 extracted from the second convolution layer in the first up-sampling, the attribute feature Xt3 extracted from the second fourth convolution layer in the second up-sampling and the fused image obtained by fusing the first second feature fusion layer are used as the input of the second feature fusion layer, and the face feature Id3 extracted from the second convolution layer in the first up-sampling, the attribute feature Xt3 extracted from the second fourth convolution layer in the second up-sampling and the fused image obtained by fusing the first second feature fusion layer are fused through the second feature fusion layer to obtain a new fused image; by analogy, a fused image obtained by fusing the face feature Id5 extracted from the fourth second convolution layer in the first up-sampling, the attribute feature Xt5 extracted from the fourth convolution layer in the second up-sampling, and the third second feature fusion layer is used as the input of the fifth second feature fusion layer, the face feature Id5 extracted from the fourth second convolution layer in the first up-sampling, the attribute feature Xt5 extracted from the fourth convolution layer in the second up-sampling, and the fused image obtained by fusing the fourth second feature fusion layer are fused by the fifth second feature fusion layer, so as to obtain a new fused image, the new fused image is the face change image sample corresponding to the image sample, and the face change image sample includes the face information of the target image sample, including the attribute information of the original image sample.

After the face features extracted for multiple times and the attribute features extracted for multiple times are fused to obtain face-changed image samples corresponding to the image sample pairs, the network parameters of the initial face-changed network model can be updated according to the face-changed image samples corresponding to the image sample pairs and the image sample pairs, that is, the following S203 is executed:

and S203, updating the network parameters of the initial face changing network model according to the image sample pairs and the face changing image samples corresponding to the image sample pairs.

For example, when updating the network parameters of the initial face-changing network model according to each image sample pair and the face-changing image sample corresponding to each image sample pair, the target image samples in each image sample pair may be subjected to the super-resolution processing to obtain super-resolution target image samples of each image sample; carrying out hyper-resolution processing on the corresponding face-changed image samples to obtain hyper-resolution face-changed image samples corresponding to the image samples; generating a mixed image sample corresponding to each image sample pair according to the super-divided target image sample of each image sample pair and the super-divided face image sample; and updating the network parameters of the initial face-changing network model according to the original image sample, the target image sample, the corresponding face-changing image sample and the mixed image sample of each image sample pair.

When the hyper-resolution processing is performed on each image sample including the target image sample corresponding to the original image sample and the corresponding face-changed image sample, for example, a DFDNet hyper-resolution network may be used for the hyper-resolution processing, or other hyper-resolution networks may also be used, and the hyper-resolution processing may be specifically set according to actual needs. For example, in this embodiment of the present application, a DFDNet hyper-resolution network may be adopted to perform hyper-resolution processing on each image sample pair including a target image sample corresponding to an original image sample and a corresponding face-changed image sample, because: the DFDNet hyper-division network has better expression on face hyper-division, and the DFDNet hyper-division network can amplify the face features in the target image sample corresponding to the original image sample and the corresponding face change image sample, thereby playing a greater auxiliary role on the following confrontation detection network.

For example, when the DFDNet hyper-resolution network is used to perform the hyper-resolution processing on the target image sample and the corresponding face-changing image sample, the DFDNet hyper-resolution network may perform the 4-fold hyper-resolution processing, may perform the 2-fold hyper-resolution processing, may perform the 6-fold hyper-resolution processing, and may be specifically set according to actual needs.

When generating a mixed image sample corresponding to each image sample pair according to a super-divided target image sample of each image sample pair and a super-divided face image sample, considering that the generation manner of the mixed image sample corresponding to each image sample pair is similar, for example, how to generate each image sample pair corresponding to the super-divided target image sample of each image sample pair and the super-divided face image sample is described by taking generation of the mixed image sample corresponding to any image sample pair as an example.

For example, when a mixed image sample corresponding to a sample pair is obtained, a first face region may be identified from a super-target image sample of the image sample pair; identifying a second face region from the hyper-conversion face image sample corresponding to the image sample; carrying out segmentation processing on the first face area to obtain a plurality of first image area blocks; carrying out segmentation processing on the second face area to obtain a plurality of second image area blocks; and then selecting image area blocks from the plurality of first image area blocks and the plurality of second image area blocks for splicing to generate a mixed image sample corresponding to the image sample pair.

Illustratively, with one Mix Layer, the blend Layer generates a blend image sample corresponding to the image sample pair. As shown in fig. 5, fig. 5 is a schematic view of a processing procedure of a Mix Layer provided in the embodiment of the present application, a hyper-target image sample of an image sample pair and a corresponding hyper-conversion face image sample may be used as two inputs of the Mix Layer, and input into the Mix Layer, and landmarks (landmarks) in the hyper-target image sample and the corresponding hyper-conversion face image are respectively detected by dlib; where dlib is a library that is dedicated to detecting human face landmarks. According to the detected landmark in the super-resolution target image sample, a human face area in the super-resolution target image sample is cut and can be marked as a first human face area; according to the detected landmark in the super-divided face image sample, a face area in the super-divided face image sample is cut out and can be marked as a second face area; performing segmentation processing on the first face area, for example, performing segmentation processing by using a python function to obtain a plurality of first face area blocks; segmenting the second face area to obtain a plurality of second face area blocks; and then selecting image region blocks from the plurality of first face region blocks and the plurality of second face region blocks for splicing to generate a mixed image sample corresponding to the image sample.

When updating the network parameters of the initial face-changing network model according to each image sample pair and the face-changing image sample corresponding to each image sample pair, considering that the loss exists in the lower branch network corresponding to the face feature extraction model, the loss exists in the lower branch network corresponding to the attribute feature extraction model, the loss exists in the confrontation detection network, and the loss is identified, therefore, according to the original image sample, the target image sample, the corresponding face-changing image sample and the mixed image sample of each image sample pair, a first loss function, a second loss function, a third loss function and a fourth loss function corresponding to each image sample pair are determined; wherein, the first loss function can be understood as the loss existing in the lower branch network corresponding to the face feature extraction model; the second loss function can be understood as the loss existing in the lower branch network corresponding to the attribute feature extraction model; the third loss function can be understood as the existence of loss in the countermeasure detection network; the fourth loss function may be understood as identifying a loss; and updating the network parameters of the initial face changing network model according to the first loss function, the second loss function, the third loss function and the fourth loss function corresponding to each image sample pair. In this way, the network parameters of the initial face-changing network model are updated together by combining the first loss function, the second loss function, the third loss function and the fourth loss function corresponding to each image sample pair, and more losses are considered, so that the accuracy of model training can be improved to a certain extent.

It can be understood that, in determining the first loss function, the second loss function, the third loss function, and the fourth loss function corresponding to each image sample pair, since the first loss function, the second loss function, the third loss function, and the fourth loss function corresponding to each image sample pair are similar, for avoiding redundancy, how to obtain the first loss function, the second loss function, the third loss function, and the fourth loss function corresponding to each image sample pair will be described by taking obtaining the first loss function, the second loss function, the third loss function, and the fourth loss function corresponding to any one of the image sample pairs as an example.

For example, when a first loss function, a second loss function, a third loss function and a fourth loss function corresponding to an image sample pair are obtained, the face change image sample corresponding to the image sample pair may be input into the face feature extraction model to obtain a first face feature corresponding to the face change image sample, and the first loss function corresponding to the image sample pair is determined according to a distance between the first face feature and the second face feature; wherein the second face features are obtained based on face features extracted from the target image sample for a plurality of times. Inputting the face-changed image samples corresponding to the image sample pairs into an attribute feature extraction model to obtain first attribute features corresponding to the face-changed image samples, and determining second loss functions corresponding to the image sample pairs according to the distance between the first attribute features and the second attribute features; and the second attribute feature is obtained based on the attribute features extracted from the original image sample for multiple times. And inputting the mixed image sample corresponding to the image sample pair into the countermeasure detection network to obtain a detection result of the image sample pair, and determining a third loss function corresponding to the image sample pair according to the detection result. And determining a fourth loss function corresponding to the image sample pair according to the face-changed image sample and the target image sample corresponding to the image sample pair.

For example, when a first loss function corresponding to an image sample pair is determined according to a distance between a first face feature and a second face feature, as shown in fig. 4, a face feature extraction model may extract five face features from a face change image sample for multiple times, average the five face features, and determine the average as the first face feature corresponding to the face change image sample; similarly, when a second face feature is obtained based on face features extracted from a target image sample in the image sample pair for multiple times, the face feature extraction model can also extract five face features from the target image sample for multiple times, average the five face features, and determine the average value as the second face feature corresponding to the target image sample; after the first facial feature and the second facial feature are respectively determined, a first loss function corresponding to the image sample pair may be determined by using the following formula 1.

l_id＝1-cos(z_id(y_s，t)，z_id(X_s) Equation 1)

Wherein l_idRepresenting a first loss function, y, corresponding to pairs of image samples_s，tRepresenting face-changed image samples, X_sRepresenting a sample of the target image, z_id(y_s，t) Representing a first facial feature, z, corresponding to the face-changed image sample_id(X_s) And representing second face features corresponding to the target image samples.

For example, when determining the second loss function corresponding to the image sample pair according to the distance between the first attribute feature and the second attribute feature, similarly as shown in fig. 4, the attribute feature extraction model may extract five attribute features from the face-changed image sample for multiple times, average the five attribute features, and determine the average as the first attribute feature corresponding to the face-changed image sample. Similarly, when second attribute features are obtained based on the attribute features extracted from the original image sample for multiple times in the image sample pair, the attribute feature extraction model can also extract five attribute features from the original image sample for multiple times, average the five attribute features, and determine the average value as the second attribute feature corresponding to the original image sample; after the first attribute feature and the second attribute feature are determined, respectively, a second loss function corresponding to the image sample pair may be determined by using the following equation 2.

Wherein l_attRepresenting a second loss function, z, corresponding to the image sample pairs_att(y_s，t) Representing a first attribute, X, corresponding to the sample of the face-changed image_tRepresenting samples of the original image, z_att(X_t) And representing a second attribute characteristic corresponding to the original image sample.

For example, when determining the third loss function corresponding to the image sample pair according to the detection result corresponding to the image sample pair, the third loss function corresponding to the image sample pair may be determined by using the following formula 3.

l_gan＝D(y_s，t) Equation 3

Wherein l_ganRepresenting a third loss function, D (y), corresponding to the image sample pairs_s，t) Representing the detection result corresponding to the image sample pair.

For example, when determining the fourth loss function corresponding to the image sample pair according to the face-changed image sample and the target image sample corresponding to the image sample pair, the method may be divided into two cases, and in one case, if the original image sample and the target image sample included in the image sample pair are different, the fourth loss function corresponding to the image sample pair may be directly determined to be 0; on the contrary, if the original image sample and the target image sample included in the image sample pair are the same, the norm between the face-changed image sample and the original image sample may be calculated, and the fourth loss function corresponding to the image sample pair may be determined according to the norm, which may be shown in the following formula 4:

wherein l_recRepresenting a fourth loss function corresponding to the pair of image samples.

In this way, the first loss function, the second loss function, the third loss function, and the fourth loss function corresponding to each image sample pair are calculated by using the above equations 1, 2, 3, and 4, respectively.

For example, when the network parameters of the initial face-changing network model are updated according to the first loss function, the second loss function, the third loss function, and the fourth loss function corresponding to each image sample pair, the target loss function corresponding to each image sample may be determined according to the first loss function, the second loss function, the third loss function, and the fourth loss function corresponding to each image sample pair; and updating the network parameters of the initial face changing network model according to the target loss function corresponding to each image sample pair.

For example, when determining the target loss function corresponding to each image sample pair according to the first loss function, the second loss function, the third loss function, and the fourth loss function corresponding to each image sample pair, for each image sample pair, a first product between the first loss function corresponding to the image sample pair and the first weight corresponding to the image sample pair may be calculated, a second product between the second loss function and the second weight corresponding to the image sample pair may be calculated, a third product between the third loss function and the third weight corresponding to the image sample pair may be calculated, a fourth product between the fourth loss function and the fourth weight corresponding to the image sample pair may be calculated, and a sum of the first product, the second product, the third product, and the fourth product may be determined as the target loss function corresponding to the image sample pair, as shown in the following formula 5:

L＝α_idL_id+α_attL_att+α_ganL_gan+α_recL_recequation 5

Wherein L represents the corresponding target loss function of the image sample pair, L_idRepresenting a first loss function, α_iDenotes a first weight, L_attRepresenting a second loss function, α_attRepresents a second weight, L_ganRepresenting a third loss function, α_ganRepresents a third weight, L_recRepresenting a fourth loss function, α_recRepresenting a fourth weight.

Illustratively, in the embodiments of the present application, the first weight α_idMay be set to 5, the second weight α_attAnd a fourth weight α_recMay be set to 10, and the third weight α_ganMay be set to 1, of course, the first weight α_idMay be set to 0.5, the second weight α_attAnd a fourth weight α_recMay be set to 1, and the third weight α_ganThe value of (a) may be set to 0.1, and may be specifically set according to actual needs, where the embodiment of the present application is not specifically limited.

Based on the above formula 5, the target loss function corresponding to each image sample pair may be obtained, and in view of a batch of image sample pairs used for performing a training process, the target loss functions corresponding to each image sample pair may be averaged, and the network parameters of the initial face replacement network model may be updated according to the average loss function. It can be understood that, in the embodiment of the present application, since the face feature extraction model is a pre-trained network model and will not be trained in the whole face-changing network model, the network parameters of the initial face-changing network model are updated, which is mainly to update the network parameters in the upper branch corresponding to the attribute feature extraction model in the modified face-changing network model.

When updating the network parameters of the initial face-changing network model according to the average loss function, judging whether the updated initial face-changing network model is converged; if the initial face changing network model is converged, determining the updated initial face changing network model as a face changing network model; if not, executing the following steps until the updated initial face changing network model converges:

inputting an original image sample and a target image sample in a plurality of new image sample pairs into an updated initial face-changing network model, extracting face features from the target image sample for multiple times, extracting attribute features from the original image sample for multiple times, and fusing the extracted face features and the extracted attribute features for multiple times to obtain face-changing image samples corresponding to each new image sample pair; and updating the updated network parameters of the initial face-changing network model according to the new image sample pairs and the face-changing image samples.

It can be seen that, in the embodiment of the application, when the face change network model is trained, a plurality of image sample pairs can be obtained first; according to the method, for each image sample, an original image sample and a target image sample which are included by the image sample pair are input into an initial face changing network model, face features are extracted from the target image sample for multiple times, attribute features are extracted from the original image sample for multiple times, and then the face features and the attribute features which are extracted for multiple times are fused to obtain a face changing image sample, so that the face changing image sample can contain more face features and attribute features.

After the face-changing network model is obtained through the training of the embodiment shown in fig. 2, the face-changing network model can be applied to scenes such as content generation, movie production, and entertainment video production, so as to implement face-changing operation through the face-changing network model.

Fig. 6 is a flowchart illustrating an image face changing method according to an embodiment of the present application, where the image face changing method may be executed by software and/or a hardware device, for example, the hardware device may be the server 101 or the terminal apparatus 102. For example, referring to fig. 6, the image face changing method may include:

s601, acquiring an original image and a target image comprising a target face.

For example, an original image and a target image including a target face may be acquired by receiving a user input; the original image and the target image including the target face may also be acquired from the local memory, or the original image and the target image including the target face may also be acquired in other manners, which may be specifically set according to actual needs.

S602, inputting the original image and the target image into a face changing network model, extracting human face features from the target image for multiple times, extracting attribute features from the original image for multiple times, and fusing the human face features extracted for multiple times and the attribute features extracted for multiple times to obtain a face changing image.

It should be noted that, in the method of inputting the original image and the target image into the face changing network model, extracting face features from the target image for multiple times, extracting attribute features from the original image for multiple times, and fusing the extracted face features and the extracted attribute features for multiple times to obtain the face changing image, in S302, inputting the target image sample, which includes the original image sample and the original image sample, of the image sample into the initial face changing network model, extracting face features from the target image sample for multiple times, extracting attribute features from the original image sample for multiple times, and fusing the extracted face features and the extracted attribute features for multiple times to obtain the face changing image sample, the method of obtaining the image sample from the corresponding face changing image sample is similar to that described in S202, which is not described herein again.

It can be seen that, in the embodiment of the present application, when a face changing operation is performed, an original image and a target image including a target face may be obtained first, the original image and the target image are input into a face changing network model, face features are extracted from the target image for multiple times, attribute features are extracted from the original image for multiple times, and the face features extracted for multiple times and the attribute features extracted for multiple times are fused to obtain a face changing image. Therefore, the face changing image is obtained by fusing the face features extracted for many times and the attribute features extracted for many times, so that the face changing image can contain more face features and attribute features, a real and natural face changing image can be obtained, and the face changing effect of the face changing image is effectively improved.

In addition, the face changing image is obtained by fusing the face features extracted for many times and the attribute features extracted for many times, so that the face changing image can contain more face features and attribute features, the problem of poor face changing effect caused by overlarge brightness difference between the original image and the target image can be solved, and the face changing effect of the face changing image is improved.

For example, in order to further obtain a high-quality face-changed image, after the face-changed image is obtained through the face-changed network model, the face-changed image may be further subjected to super-resolution processing, for example, the face-changed image may be subjected to 4-fold super-resolution processing through a DFDNet super-resolution network, so as to obtain a super-resolved target face-changed image; and the target face-changing image is output, so that the high-quality target face-changing image can be displayed, and the effect of the face-changing image is further improved.

The implementation principle and the beneficial effect of the image face changing method shown in the embodiment of the application are similar to those of the face changing network model training method, and the implementation principle and the beneficial effect of the face changing network model training method can be referred to, and are not repeated herein.

Fig. 7 is a schematic structural diagram of a training device 70 for a face-changing network model according to an embodiment of the present application, for example, please refer to fig. 7, where the training device 70 for a face-changing network model may include:

an obtaining unit 701 is configured to obtain a plurality of image sample pairs, where each image sample pair includes an original image sample and a target image sample corresponding to the original image sample.

The processing unit 702 is configured to, for each image sample, input an original image sample and a target image sample in the image sample pair into an initial face-changing network model, extract human face features from the target image sample multiple times and extract attribute features from the original image sample multiple times, and fuse the human face features extracted multiple times and the attribute features extracted multiple times to obtain a face-changing image sample corresponding to the image sample pair.

And the training unit 703 is configured to update the network parameters of the initial face-changing network model according to each image sample pair and the face-changing image sample corresponding to each image sample pair.

Optionally, the initial face-changing network model includes an attribute feature extraction model, a human face feature extraction model, and a feature fusion model.

The processing unit 702 is specifically configured to input a target image sample into a face feature extraction model, extract face features from the target image sample for multiple times, and output multiple face features; inputting an original image sample into an attribute feature extraction model, extracting attribute features from the original image sample for multiple times, and outputting multiple attribute features; and inputting the attribute features and the face features into a feature fusion model for fusion to obtain a face-changed image sample corresponding to the image sample.

Optionally, the face feature extraction model includes a first downsampling network and a first upsampling network, the first downsampling network includes a plurality of first convolution layers connected in sequence, and the first upsampling network includes a plurality of second convolution layers connected in sequence.

The processing unit 702 is specifically configured to input the target image sample into the face feature extraction model, extract the face features in the target image sample through the last first convolution layer in the first downsampling network, and extract the face features in the target image sample through each second convolution layer in the first upsampling network, so as to extract the face features multiple times.

Optionally, the attribute feature extraction model includes a second downsampling network and a second upsampling network, the second downsampling network includes a plurality of third convolutional layers, and the second upsampling network includes a plurality of fourth convolutional layers.

The processing unit 702 is specifically configured to input the original image sample into the attribute feature extraction model, extract the attribute features in the original image sample through the last third convolutional layer in the second down-sampling network, and extract the attribute features in the original image sample through each fourth convolutional layer in the second up-sampling network, so as to extract the attribute features multiple times.

Optionally, the feature fusion model includes a first feature fusion layer and a plurality of second feature fusion layers that are sequentially connected, and the plurality of second feature fusion layers correspond to the plurality of second convolution layers and the plurality of fourth convolution layers one to one, respectively.

The processing unit 702 is specifically configured to fuse, by using the first feature fusion layer, the face feature extracted by the last first convolution layer and the attribute feature extracted by the last third convolution layer to obtain a fused image; and for each second feature fusion layer, fusing the face features extracted from the corresponding second convolution layer, the attribute features extracted from the corresponding fourth convolution layer and the fused image fused by the previous feature fusion layer, and determining the fused image fused by the last second feature fusion layer as the face change image sample corresponding to the image sample pair.

Optionally, the training unit 703 is specifically configured to perform a super-resolution process on the target image samples in each image sample pair to obtain super-resolution target image samples of each image sample; carrying out hyper-resolution processing on the corresponding face-changed image samples to obtain hyper-resolution face-changed image samples corresponding to the image samples; generating a mixed image sample corresponding to each image sample pair according to the super-divided target image sample of each image sample pair and the super-divided face image sample; and updating the network parameters of the initial face-changing network model according to the original image sample, the target image sample, the corresponding face-changing image sample and the mixed image sample of each image sample pair.

Optionally, the training unit 703 is specifically configured to, for each image sample pair, generate a mixed image sample corresponding to each sample pair in a specific manner as follows:

identifying a first face region from a hyper-target image sample of the image sample pair; identifying a second face region from the hyper-conversion face image sample corresponding to the image sample; carrying out segmentation processing on the first face area to obtain a plurality of first image area blocks; carrying out segmentation processing on the second face area to obtain a plurality of second image area blocks; and selecting image area blocks from the plurality of first image area blocks and the plurality of second image area blocks for splicing to generate a mixed image sample corresponding to the image sample pair.

Optionally, the training unit 703 is specifically configured to determine, according to an original image sample, a target image sample, a corresponding face-change image sample, and a mixed image sample of each image sample pair, a first loss function, a second loss function, a third loss function, and a fourth loss function corresponding to each image sample pair; and updating the network parameters of the initial face changing network model according to the first loss function, the second loss function, the third loss function and the fourth loss function corresponding to each image sample pair.

Optionally, the training unit 703 is specifically configured to determine, for each image sample pair, a first loss function, a second loss function, a third loss function, and a fourth loss function corresponding to each image sample pair in a specific manner:

inputting the face-changed image samples corresponding to the image samples into a face feature extraction model to obtain first face features corresponding to the face-changed image samples, and determining first loss functions corresponding to the image samples according to the distance between the first face features and the second face features; wherein the second face features are obtained based on face features extracted from the target image sample for a plurality of times; inputting the face-changed image samples corresponding to the image sample pairs into an attribute feature extraction model to obtain first attribute features corresponding to the face-changed image samples, and determining second loss functions corresponding to the image sample pairs according to the distance between the first attribute features and the second attribute features; the second attribute features are obtained based on attribute features extracted from the original image sample for multiple times; inputting the mixed image sample corresponding to the image sample pair into the countermeasure detection network to obtain a detection result of the image sample pair, and determining a third loss function corresponding to the image sample pair according to the detection result; and determining a fourth loss function corresponding to the image sample pair according to the face-changed image sample and the target image sample corresponding to the image sample pair.

Optionally, the training unit 703 is specifically configured to determine a target loss function corresponding to each image sample pair according to a first loss function, a second loss function, a third loss function, and a fourth loss function corresponding to each image sample pair; and updating the network parameters of the initial face changing network model according to the target loss function corresponding to each image sample pair.

The training device 70 for the face-changing network model shown in the embodiment of the present application can execute the technical scheme of the training method for the face-changing network model in the above embodiments, and its implementation principle and beneficial effect are similar to those of the training method for the face-changing network model, and reference may be made to the implementation principle and beneficial effect of the training method for the face-changing network model, which are not described herein again.

Fig. 8 is a schematic structural diagram of an image face-changing device 80 according to an embodiment of the present application, and for example, please refer to fig. 8, the image face-changing device 80 may include:

an acquisition unit 801 is configured to acquire an original image and a target image including a target face.

The processing unit 802 is configured to input an original image and a target image into a face-changing network model, extract facial features from the target image multiple times, extract attribute features from the original image multiple times, and fuse the extracted facial features and the extracted attribute features multiple times to obtain a face-changing image.

Optionally, the image resurfacing device 80 further comprises an output unit 803.

The processing unit 802 is further configured to perform a super-resolution process on the face-changed image to obtain a target face-changed image.

An output unit 803 for outputting the target face-changed image.

The image face changing device 80 shown in the embodiment of the present application can execute the technical solution of the image face changing method in the above embodiment, and the implementation principle and the beneficial effect thereof are similar to those of the image face changing method, and reference may be made to the implementation principle and the beneficial effect of the image face changing method, which is not repeated here.

Fig. 9 is a schematic structural diagram of an electronic device 90 provided in an embodiment of the present application, and for example, please refer to fig. 9, the electronic device 90 may include a processor 901 and a memory 902; wherein the content of the first and second substances,

a memory 902; for storing a computer program;

the processor 901 is configured to read the computer program stored in the memory 902, and execute the training method of the face-changing network model according to the embodiment described above according to the computer program in the memory 902, or execute the image face-changing method according to the embodiment described above.

Alternatively, the memory 902 may be separate or integrated with the processor 901. When the memory 902 is a separate device from the processor 901, the electronic device 90 may further include: a bus for connecting the memory 902 and the processor 901.

Optionally, this embodiment further includes: a communication interface that may be connected to the processor 901 via a bus. The processor 901 may control the communication interface to implement the acquisition and transmission functions of the electronic device 90 described above.

For example, in the embodiment of the present application, the electronic device 90 may be a terminal, or may also be a server, and may be specifically configured according to actual needs.

The electronic device 90 shown in the embodiment of the present application may execute the technical scheme of the training method for the face changing network model in the above embodiment, and its implementation principle and beneficial effect are similar to those of the training method for the face changing network model, and may refer to the implementation principle and beneficial effect of the training method for the face changing network model, or execute the technical scheme of the image face changing method in the above embodiment, and its implementation principle and beneficial effect are similar to those of the image face changing method, and may refer to the implementation principle and beneficial effect of the image face changing method, which is not described herein again.

The embodiment of the present application further provides a computer-readable storage medium, where a computer execution instruction is stored in the computer-readable storage medium, and when a processor executes the computer execution instruction, a technical scheme of a training method for a face changing network model in the foregoing embodiment is implemented, and an implementation principle and a beneficial effect of the technical scheme are similar to those of the training method for the face changing network model, which can be referred to as the implementation principle and the beneficial effect of the training method for the face changing network model, or a technical scheme of an image face changing method in the foregoing embodiment is implemented, and an implementation principle and a beneficial effect of the technical scheme are similar to those of the image face changing method, which can be referred to as the implementation principle and the beneficial effect of the image face changing method, and are not described herein again.

The embodiment of the present application further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the technical scheme of the training method for the face changing network model in the foregoing embodiments is implemented, and the implementation principle and the beneficial effect of the training method for the face changing network model are similar to those of the training method for the face changing network model, which can be referred to as the implementation principle and the beneficial effect of the training method for the face changing network model, or the technical scheme of the image face changing method in the foregoing embodiments is implemented, and the implementation principle and the beneficial effect of the image face changing method are similar to those of the image face changing method, which can be referred to as the implementation principle and the beneficial effect of the image face changing method, and details are not repeated herein.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one type of logical functional division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts shown as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment. In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

The integrated module implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present application.

It should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.

The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile storage NVM, such as at least one disk memory, and may also be a usb disk, a removable hard disk, a read-only memory, a magnetic or optical disk, etc.

The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.

The computer-readable storage medium may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A training method of a face changing network model is characterized by comprising the following steps:

2. The method according to claim 1, wherein the initial face-changing network model comprises an attribute feature extraction model, a human face feature extraction model and a feature fusion model;

wherein, the extracting the human face features from the target image sample for multiple times comprises: inputting the target image sample into the human face feature extraction model, extracting human face features from the target image sample for multiple times, and outputting multiple human face features;

the extracting attribute features from the original image sample for multiple times includes: inputting the original image sample into the attribute feature extraction model, extracting attribute features from the original image sample for multiple times, and outputting multiple attribute features;

the method for fusing the face features extracted for many times and the attribute features extracted for many times to obtain face-changed image samples corresponding to the image samples comprises the following steps: and inputting the attribute features and the face features into the feature fusion model for fusion to obtain face change image samples corresponding to the image samples.

3. The method of claim 2, wherein the face feature extraction model comprises a first downsampling network and a first upsampling network, the first downsampling network comprising a plurality of first convolutional layers connected in sequence, the first upsampling network comprising a plurality of second convolutional layers connected in sequence;

the inputting the target image sample into the facial feature extraction model, and extracting facial features from the target image sample for multiple times, includes:

inputting the target image sample into the face feature extraction model, extracting the face features in the target image sample through the last first convolution layer in the first down-sampling network, and extracting the face features in the target image sample through each second convolution layer in the first up-sampling network respectively so as to extract the face features for multiple times.

4. The method of claim 3, wherein the attribute feature extraction model comprises a second downsampling network comprising a plurality of third convolutional layers and a second upsampling network comprising a plurality of fourth convolutional layers;

the inputting the original image sample into the attribute feature extraction model, and extracting the attribute features from the original image sample for multiple times, includes:

inputting the original image sample into the attribute feature extraction model, extracting the attribute features in the original image sample through the last third convolution layer in the second down-sampling network, and extracting the attribute features in the original image sample through each fourth convolution layer in the second up-sampling network respectively so as to extract the attribute features for multiple times.

5. The method according to claim 4, wherein the feature fusion model comprises a first feature fusion layer and a plurality of second feature fusion layers connected in sequence, and the plurality of second feature fusion layers respectively correspond to the plurality of second convolution layers and the plurality of fourth convolution layers in a one-to-one manner;

inputting the attribute features and the face features into the feature fusion model for fusion, including:

fusing the face features extracted from the last first convolution layer and the attribute features extracted from the last third convolution layer through the first feature fusion layer to obtain a fused image;

and for each second feature fusion layer, fusing the face features extracted from the corresponding second convolution layer, the attribute features extracted from the corresponding fourth convolution layer and the fused image fused by the previous feature fusion layer, and determining the fused image fused by the last second feature fusion layer as the face change image sample corresponding to the image sample pair.

6. The method according to any one of claims 2 to 5, wherein the updating the network parameters of the initial face-changed network model according to the image sample pairs and the face-changed image samples corresponding to the image sample pairs comprises:

performing super-division processing on the target image samples in each image sample pair to obtain super-divided target image samples of each image sample;

carrying out hyper-resolution processing on the face-changed image samples corresponding to the image samples to obtain hyper-resolution face-changed image samples corresponding to the image samples;

generating a mixed image sample corresponding to each image sample pair according to the super-divided target image sample of each image sample pair and the super-divided face image sample;

and updating the network parameters of the initial face-changing network model according to the original image sample, the target image sample, the corresponding face-changing image sample and the mixed image sample of each image sample pair.

7. The method according to claim 6, wherein for each image sample pair, a specific way of generating the mixed image sample corresponding to each sample pair is as follows:

identifying a first face region from a hyper-target image sample of the pair of image samples; identifying a second face region from the hyper-converted face image sample corresponding to the image sample pair;

performing segmentation processing on the first face area to obtain a plurality of first image area blocks; carrying out segmentation processing on the second face area to obtain a plurality of second image area blocks;

selecting image area blocks from the plurality of first image area blocks and the plurality of second image area blocks for splicing, and generating the mixed image sample corresponding to the image sample.

8. The method of claim 6, wherein updating the network parameters of the initial face-change network model based on the original image samples, the target image samples, the corresponding face-change image samples, and the blended image samples of the image sample pairs comprises:

determining a first loss function, a second loss function, a third loss function and a fourth loss function corresponding to each image sample pair according to the original image sample, the target image sample, the corresponding face-changed image sample and the mixed image sample of each image sample pair;

and updating the network parameters of the initial face changing network model according to the first loss function, the second loss function, the third loss function and the fourth loss function corresponding to each image sample pair.

9. The method according to claim 8, wherein the specific manner for determining the first loss function, the second loss function, the third loss function and the fourth loss function corresponding to each image sample pair is as follows:

inputting the face change image samples corresponding to the image sample pairs into the face feature extraction model to obtain first face features corresponding to the face change image samples, and determining the first loss function corresponding to the image sample pairs according to the distance between the first face features and the second face features; wherein the second facial features are derived based on facial features extracted from the target image sample a plurality of times;

inputting the face-changed image samples corresponding to the image sample pairs into the attribute feature extraction model to obtain first attribute features corresponding to the face-changed image samples, and determining second loss functions corresponding to the image sample pairs according to the distance between the first attribute features and the second attribute features; the second attribute features are obtained based on attribute features extracted from the original image sample for multiple times;

inputting the mixed image sample corresponding to the image sample pair into a countermeasure detection network to obtain a detection result of the image sample pair, and determining the third loss function corresponding to the image sample pair according to the detection result;

and determining the fourth loss function corresponding to the image sample pair according to the face-changed image sample corresponding to the image sample pair and the target image sample.

10. The method according to claim 8, wherein the updating the network parameters of the initial face-changing network model according to the first loss function, the second loss function, the third loss function, and the fourth loss function corresponding to each image sample pair comprises:

determining a target loss function corresponding to each image sample pair according to the first loss function, the second loss function, the third loss function and the fourth loss function corresponding to each image sample pair;

and updating the network parameters of the initial face changing network model according to the target loss function corresponding to each image sample pair.

11. An image face changing method, comprising:

acquiring an original image and a target image comprising a target face;

12. The method of claim 11, further comprising:

carrying out super-resolution processing on the face-changed image to obtain a target face-changed image;

and outputting the target face changing image.

13. A training device for a face-changing network model is characterized by comprising:

14. An image face changing device, comprising:

15. An electronic device, comprising: a processor and a memory;

a memory; for storing a computer program;

the processor is configured to read the computer program stored in the memory, and execute the training method of the face-changing network model according to any one of claims 1 to 10 or execute the image face-changing method according to any one of claims 11 to 12 according to the computer program in the memory.

16. A readable storage medium, in which a computer program is stored computer-executable instructions for implementing a method for training a face-changing network model according to any one of claims 1 to 10, or for implementing an image face-changing method according to any one of claims 11 to 12, when the computer program is executed by a processor.

17. A computer program product, characterized in that the computer program product comprises a computer program which, when executed, implements a method of training a facetted network model as claimed in any one of claims 1 to 10; alternatively, the computer program when executed implements the image facelining method of any of claims 11-12.