CN111523413B

CN111523413B - Method and device for generating face image

Info

Publication number: CN111523413B
Application number: CN202010281600.5A
Authority: CN
Inventors: 希滕; 张刚; 温圣召
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-04-10
Filing date: 2020-04-10
Publication date: 2023-06-23
Anticipated expiration: 2040-04-10
Also published as: CN111523413A

Abstract

The embodiment of the disclosure discloses a method and a device for generating a face image, and relates to the field of image processing. The method comprises the following steps: inputting the first face image into a trained texture feature generation network to generate texture features of the first face image; inputting the second face image into a trained face recognition network for identity feature extraction to obtain the identity feature of the second face image; and splicing the texture features of the first face image and the identity features of the second face image to form spliced features, and decoding the spliced features by using a pre-trained decoder to obtain a synthesized face image fused with the texture features of the first face image and the identity features of the second face image. The method can obtain high-quality synthetic face images.

Description

Method and device for generating face image

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to the technical field of image processing, and particularly relates to a method and a device for generating a face image.

Background

Image synthesis is an important technology in the field of image processing. In the current image processing technology, image synthesis is generally performed by "matting" to divide and paste a part of the content in one image into another image.

The composition of the face image can be applied to create virtual characters, which can enrich the functions of image and video class applications. Aiming at the synthesis of the face image, the quality of the synthesized face image is poor because the matting technology needs complicated manual operation and the pose and the expression of the face image obtained by matting are usually in an unnatural state.

Disclosure of Invention

Embodiments of the present disclosure provide methods and apparatus, electronic devices, and computer-readable media for generating face images.

In a first aspect, embodiments of the present disclosure provide a method of generating a face image, including: inputting the first face image into a trained texture feature generation network to generate texture features of the first face image; inputting the second face image into a trained face recognition network for identity feature extraction to obtain the identity feature of the second face image; and splicing the texture features of the first face image and the identity features of the second face image to form spliced features, and decoding the spliced features by using a pre-trained decoder to obtain a synthesized face image fused with the texture features of the first face image and the identity features of the second face image.

In some embodiments, the texture feature generation network comprises a generator in a trained generation countermeasure network; generating a discriminator in the countermeasure network for discriminating: after the texture feature generation network and the face recognition network are respectively input into the test face image, the texture feature and the identity feature are extracted, and whether the face image obtained by decoding based on the feature obtained by splicing the texture feature and the identity feature of the test face image is consistent with the test face image or not.

In some embodiments, the above method further comprises: training a texture feature generation network and decoder based on the first sample face image set; training the texture feature generation network and decoder based on the first set of sample face images comprises: taking a generator in a preset generation countermeasure network as a texture feature generation network to be trained, and acquiring a decoder to be trained; inputting a first sample face image in the first sample face image set to a texture feature generation network to be trained to obtain texture features of the first sample face image; inputting the first sample face image into a trained face recognition network for identity feature extraction to obtain the identity features of the first sample face image; splicing the texture features and the identity features of the first sample face image to obtain fusion features of the first sample face image; decoding the fusion characteristics of the first sample face image by using a decoder to be trained to obtain a predicted face image corresponding to the first sample face image; and judging the first sample face image and the corresponding predicted face image by utilizing a preset discriminator in the generation countermeasure network, and iteratively adjusting parameters of the texture feature generation network to be trained, the discriminator and the decoder to be trained according to the judging error of the discriminator.

In some embodiments, the above method further comprises: training a face recognition network based on a second sample face image set, wherein a second sample face image in the second sample face image set comprises identity information of a corresponding face; training a face recognition network based on the second sample face image set, comprising: and (3) supervising the training of the face recognition network by adopting a pre-constructed loss function, wherein the value of the loss function is inversely related to the value of the face image capacity of the face recognition network for distinguishing different identity information.

In some embodiments, the above method further comprises: constructing a synthetic face image set based on the synthetic face image; and training a face-changing monitoring model by utilizing the synthesized face image set.

In a second aspect, embodiments of the present disclosure provide an apparatus for generating a face image, including: a first extraction unit configured to input a first face image into a trained texture feature generation network, generating texture features of the first face image; the second extraction unit is configured to input a second face image into the trained face recognition network to extract the identity characteristics, so as to obtain the identity characteristics of the second face image; and the synthesis unit is configured to splice the texture features of the first face image and the identity features of the second face image to form spliced features, and a pre-trained decoder is utilized to decode the spliced features to obtain a synthesized face image fused with the texture features of the first face image and the identity features of the second face image.

In some embodiments, the apparatus further comprises: a first training unit configured to train the texture feature generation network and the decoder based on the first sample face image set in the following manner: taking a generator in a preset generation countermeasure network as a texture feature generation network to be trained, and acquiring a decoder to be trained; inputting a first sample face image in the first sample face image set to a texture feature generation network to be trained to obtain texture features of the first sample face image; inputting the first sample face image into a trained face recognition network for identity feature extraction to obtain the identity features of the first sample face image; splicing the texture features and the identity features of the first sample face image to obtain fusion features of the first sample face image; decoding the fusion characteristics of the first sample face image by using a decoder to be trained to obtain a predicted face image corresponding to the first sample face image; and judging the first sample face image and the corresponding predicted face image by utilizing a preset discriminator in the generation countermeasure network, and iteratively adjusting parameters of the texture feature generation network to be trained, the discriminator and the decoder to be trained according to the judging error of the discriminator.

In some embodiments, the apparatus further comprises: the second training unit is configured to train the face recognition network based on a second sample face image set, wherein the second sample face image in the second sample face image set comprises identity information of a corresponding face; a second training unit configured to train the face recognition network based on the second sample face image set in the following manner: and (3) supervising the training of the face recognition network by adopting a pre-constructed loss function, wherein the value of the loss function is inversely related to the value of the face image capacity of the face recognition network for distinguishing different identity information.

In some embodiments, the apparatus further comprises: a construction unit configured to construct a synthetic face image set based on the synthetic face image; and a third training unit configured to train the face-change monitoring model using the synthetic face image set.

In a third aspect, embodiments of the present disclosure provide an electronic device, comprising: one or more processors; and a storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method of generating a face image as provided in the first aspect.

In a fourth aspect, embodiments of the present disclosure provide a computer-readable medium having a computer program stored thereon, wherein the program when executed by a processor implements the method of generating a face image provided in the first aspect.

According to the method and the device for generating the face image, the first face image is input into the trained texture feature generation network to generate the texture feature of the first face image, the second face image is input into the trained face recognition network to conduct identity feature extraction to obtain the identity feature of the second face image, the texture feature of the first face image and the identity feature of the second face image are spliced to form spliced features, a pre-trained decoder is utilized to decode the spliced features to obtain the synthesized face image fused with the texture feature of the first face image and the identity feature of the second face image, and therefore the high-quality synthesized face image can be obtained.

Drawings

Other features, objects and advantages of the present disclosure will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings:

FIG. 1 is an exemplary system architecture diagram in which embodiments of the present disclosure may be applied;

FIG. 2 is a flow chart of one embodiment of a method of generating a face image according to the present disclosure;

FIG. 3 is a schematic diagram of a training flow of a texture feature generation network and decoder;

FIG. 4 is a schematic diagram of one implementation of a training process for a texture feature generation network and decoder;

FIG. 5 is a schematic structural view of one embodiment of an apparatus of the present disclosure for generating a face image;

fig. 6 is a schematic diagram of a computer system suitable for use in implementing embodiments of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.

It should be noted that, without conflict, the embodiments of the present disclosure and features of the embodiments may be combined with each other. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates an exemplary system architecture 100 to which the methods of generating a face image or apparatus of generating a face image of the present disclosure may be applied.

As shown in fig. 1, a system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the

terminal devices

101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The

terminal devices

101, 102, 103 interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may be user side devices on which various applications may be installed. Such as image/video processing class applications, payment applications, social platform class applications, and so forth. The user 110 may upload face images using the

terminal devices

101, 102, 103.

The

terminal devices

101, 102, 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smartphones, tablet computers, electronic book readers, laptop and desktop computers, and the like. When the

terminal devices

101, 102, 103 are software, they can be installed in the above-listed electronic devices. Which may be implemented as multiple software or software modules (e.g., multiple software or software modules for providing distributed services) or as a single software or software module. The present invention is not particularly limited herein.

The server 105 may be a server running various services, such as a server providing background support for video-type applications running on the

terminal devices

101, 102, 103. The server 105 may receive a face image synthesis request sent by the

terminal devices

101, 102, 103, synthesize the face image requested to be synthesized, obtain a synthesized face image, and feed back the synthesized face image or a synthesized face video formed by the synthesized face image to the

terminal devices

101, 102, 103. The

terminal devices

101, 102, 103 may present the composite face image or the composite face video to the user 110.

The server 105 may also receive the image or video data uploaded by the

terminal devices

101, 102, 103 to construct a sample face image set corresponding to the neural network model of various application scenarios in the face image or video processing technology.

The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster formed by a plurality of servers, or as a single server. When server 105 is software, it may be implemented as multiple software or software modules (e.g., multiple software or software modules for providing distributed services), or as a single software or software module. The present invention is not particularly limited herein.

In some scenarios, the

terminal devices

101, 102, 103 may deploy and run a trained neural network locally, with the trained neural network being utilized to perform the synthesis of face images.

It should be noted that, the method for generating a face image provided by the embodiment of the present disclosure may be performed by the

terminal device

101, 102, 103 or the server 105, and accordingly, the apparatus for generating a face image may be provided in the

terminal device

101, 102, 103 or the server 105.

In some scenarios, the server 105 may obtain the required data (e.g., training samples and face image pairs to be synthesized) from a database, memory, or other device, at which point the exemplary system architecture 100 may be absent from the

terminal devices

101, 102, 103 and network 104.

Alternatively, the

terminal devices

101, 102, 103 may obtain the face image to be synthesized submitted by the user locally, at which time the exemplary system architecture 100 may not have the network 104 or the server 105.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to fig. 2, a flow 200 of one embodiment of a method of generating a face image according to the present disclosure is shown. The method for generating the face image comprises the following steps:

Step 201, inputting the first face image into a trained texture feature generation network to generate texture features of the first face image.

In the present embodiment, an execution subject (such as a server shown in fig. 1) of the method of generating a face image may acquire a first face image and a second face image. The first face image may be a face image of a first user, and the second face image may be a face image of a second user. The first user and the second user may be different users, and both may have different user identities.

The first face image is used for providing texture information for synthesizing the face image, and the texture information characterizes the gesture and the expression of the face. The second face image is used for providing identity information for synthesizing the face image, and the identity information characterizes the identity of the user corresponding to the face and is different from other users.

In practice, the above-described execution subject may acquire the first face image and the second face image submitted or selected by the user in response to the user issuing the face synthesis request. For example, in an exemplary scenario, a user may select a face image in a network as a first face image, upload his face image as a second face image.

After the first face image is obtained, the trained texture feature generation network can be utilized to extract texture features of the first face image, so that the texture features of the first face image are obtained.

The texture feature generation network may be trained in advance based on a sample face image containing expression class annotation information. In a specific implementation manner, the texture feature of the face image of the user can be extracted by using the texture feature generation network to be trained, the expression type of the face in the face image of the user can be identified based on the extracted texture feature, the identification error is determined according to the labeling information, the parameter of the texture feature generation network is iteratively adjusted by adopting an error back propagation method, and the trained texture feature generation network is obtained after multiple iterations.

Step 202, inputting the second face image into a trained face recognition network for identity feature extraction, and obtaining the identity features of the second face image.

The trained face recognition network is used for recognizing the identity of the corresponding user according to the input face image. In this embodiment, the trained face recognition network may be used to perform feature extraction on the second face image, and the extracted feature is used as the identity feature of the second face image.

The face recognition network comprises a feature extraction network and a classifier. The face recognition network may be implemented as a convolutional neural network, such as a Resnet network, where the feature extraction network may, for example, comprise a plurality of convolutional layers, pooling layers. The feature extraction of the second face image may be performed by a feature extraction network in the face recognition network.

In practice, the last fully-connected layer in the feature extraction network in the trained face recognition network may be deleted, and the output of the upper layer of the last fully-connected layer is taken as the identity feature of the second face image.

And 203, splicing the texture features of the first face image and the identity features of the second face image to form spliced features, and decoding the spliced features by using a pre-trained decoder to obtain a synthesized face image fused with the texture features of the first face image and the identity features of the second face image.

Texture features of the first face image generated in step 201 and identity features of the second face image extracted in step 202 may be spliced through a concat (splicing) operation, or the texture features and the identity features may be spliced in a weighted manner according to a preset rule, so as to obtain splicing features.

The stitched features may then be decoded using a pre-trained decoder for decoding the features to obtain an image, resulting in a composite face image. The composite face image incorporates the texture features of the first face image and the identity features of the second face image.

The decoder may be trained on the basis of the sample face image in advance. Specifically, features of the sample face image may be extracted by trained neural network models applied to the face image and including a feature extraction network, such as a network for recognizing a face, a network for constructing a face model, a network for compressing the face image, and the like. And then, decoding the characteristics of the sample face image by using a decoder to be trained to obtain a restored face image. And constructing a loss function according to the difference between the sample face image and the corresponding restored face image, and supervising the training of the decoder by using the loss function.

According to the method for generating the face image, the texture feature of the first face image is generated by inputting the first face image into the trained texture feature generating network, the identity feature of the second face image is extracted by inputting the second face image into the trained face recognition network, the identity feature of the second face image is obtained, the texture feature of the first face image and the identity feature of the second face image are spliced to form the spliced feature, a pre-trained decoder is utilized to decode the spliced feature, and the synthesized face image fused with the texture feature of the first face image and the identity feature of the second face image can be obtained.

In some embodiments, the texture feature generation network comprises a generator in a trained generation countermeasure network. And the above-described discriminator in the generation countermeasure network is for discriminating: after the texture feature generation network and the face recognition network are respectively input into the test face image, the texture feature and the identity feature are extracted, and whether the face image obtained by decoding based on the feature obtained by splicing the texture feature and the identity feature of the test face image is consistent with the test face image or not.

Specifically, the generator in the trained generating countermeasure network can be used as a texture feature generating network, the texture features of the test face image are extracted by the generator when the generating countermeasure network is trained, the face image is restored based on the texture features of the test face image and the identity features extracted from the same test face image by the face recognition network, and the discriminator discriminates whether the restored face image is consistent with the corresponding test image. If the judging result of the judging device is that the probability that the face image obtained by restoration is consistent with the corresponding test image is far more than 0.5 or far less than 0.5, the parameters of the generator can be adjusted, so that the judging device is more difficult to accurately judge whether the face image obtained by restoration is consistent with the corresponding test image. In the training process, the generator and the discriminator are used for countermeasure training, and after the training is finished, the generator can generate texture features which are difficult for the discriminator to accurately judge whether the restored face image is consistent with the corresponding test image.

In the training of the current face synthesis model, the current training method is difficult to train a reliable model because a large amount of sample data containing labeling information is difficult to acquire. In the method, the texture feature extraction network is trained based on the unmarked test image in training, the test image is used as self marking information, a large amount of paired data is not required to be constructed as samples for training and generating the countermeasure network, and the marking cost of sample data for training the face synthesis model is effectively reduced.

Further, the texture feature generation network and decoder described above may be trained by the process shown in fig. 3.

As shown in fig. 3, the texture feature generation network and decoder may train in the following flow based on the first sample face image set:

first, in step 301, a predetermined generator in a generating countermeasure network is used as a texture feature generating network to be trained, and a decoder to be trained is acquired.

Specifically, the preset generator and the preset arbiter in the generation countermeasure network can be respectively constructed based on the convolutional neural network and the classifier, and parameters of the generator and the arbiter can be pre-trained or randomly initialized. The decoder to be trained may be a deconvolution neural network.

In step 302, a first sample face image in the first sample face image set is input to a texture feature generation network to be trained, and texture features of the first sample face image are obtained.

The texture feature generation network to be trained can extract texture features in the first sample face image to obtain the texture features of the first sample face image.

In step 303, the first sample face image is input into a trained face recognition network for identity feature extraction, so as to obtain identity features of the first sample face image.

In this embodiment, the trained face recognition network may be used to perform the extraction of the identity feature from the first sample face image, and the specific implementation may refer to step 202 in the foregoing embodiment.

Thereafter, in step 304, the texture features and the identity features of the first sample face image are stitched to obtain a fused feature of the first sample face image.

The texture features and the identity features of the first sample face image can be spliced through concat operation or according to a preset rule to obtain fusion features of the first sample face image.

In step 305, the fusion feature of the first sample face image is decoded by using the decoder to be trained, so as to obtain a predicted face image corresponding to the first sample face image.

The fused features are then input to a decoder to be trained for decoding, which may restore the image based on the features. And taking the decoding result of the fusion characteristic by the decoder to be trained as a predicted face image corresponding to the first sample face image.

In step 306, the first sample face image and the corresponding predicted face image are determined by using a preset discriminator in the generation countermeasure network, and parameters of the texture feature generation network to be trained, the discriminator and the decoder to be trained are iteratively adjusted according to the determination error of the discriminator.

The discriminator may discriminate whether the first sample face image and the corresponding predicted face image are real face images, i.e., whether the first sample face image is a predicted face image and whether the predicted face image is a first sample face image. The discrimination error of the discriminator may be determined based on at least one of: the discriminator determines the predicted face image as the probability of the corresponding first sample face image, and the discriminator determines the first sample face image as the probability of the predicted face image. The judging device judges the predicted face image as the corresponding first sample face image, and the judging error is larger as the probability is larger; the larger the probability that the discriminator decides the first sample face image as the predicted face image, the larger the discrimination error.

When the discrimination error of the discriminator exceeds the preset range, parameters of the texture feature generation network to be trained, the discriminator and the decoder to be trained can be adjusted based on the error back propagation mode, the texture feature generation network after the parameters are updated, the discriminator and the decoder to be trained are obtained, and then the next iteration training operation is executed in the step 302.

Fig. 4 shows a schematic time principle diagram of the above-mentioned flow 300. As shown in fig. 4, the first sample face image SImage is input to a texture feature generating network to be trained and a trained face recognition network respectively to obtain texture features and ID features (identity features), and after the texture features and the ID features are spliced, feature decoding is performed by a decoder to be trained to obtain a restored face image RImage. The corresponding loss function can be constructed based on the discrimination capability of the discriminator between the restored face image RImage and the first sample face image SImage, the discrimination error is determined, and the texture feature to be trained is returned and adjusted according to the discrimination error to generate a network and a decoder to be trained.

Through the above-mentioned process 300, the method for generating a face image according to the present embodiment can further train the texture feature generating network and the decoder by using the low-cost sample image set. The face recognition network may be decoupled from the decoder and texture feature generation network, or a better performing face recognition network may be used directly.

In some embodiments, the method for generating a face image may further include: training a face recognition network based on a second sample face image set, wherein a second sample face image in the second sample face image set comprises identity information of a corresponding face; training a face recognition network based on the second sample face image set, comprising: and (3) supervising the training of the face recognition network by adopting a pre-constructed loss function, wherein the value of the loss function is inversely related to the value of the face image capacity of the face recognition network for distinguishing different identity information.

In these embodiments, the face recognition network may be trained independently of the texture feature generation network and the decoder. The second set of sample face images may be at least partially identical to the first set of sample face images or completely different. The ability of a face recognition network to distinguish between face images of different identity information may be inversely related to the probability that the face recognition network recognizes face images having different identity information as face images of the same user. The value of the penalty function is positively correlated with the probability that the face image with different identity information can be identified by the face recognition network as a face image of the same user. In practice, the probability that the face recognition network recognizes a second sample face image having different identity information as a face image of the same user may be used as a loss function to supervise training of the face recognition network.

The face recognition network obtained through training in the mode can extract the identity characteristics which can accurately distinguish different users from the face images, so that the accuracy of the extracted characteristics can be improved when the face recognition network is applied to the face generation method of the embodiment of the disclosure, and the quality of the synthesized face images is improved.

In some embodiments, the synthetic face image obtained by the method described above may be used to train a face-change detection model. Specifically, the method may further include constructing a synthetic face image set based on the synthetic face images, and training a face-change monitoring model using the synthetic face image set.

The method provided by the above embodiment may be used to generate a synthetic face image for a plurality of image pairs composed of different first face images and second face images, respectively, so as to obtain a large number of synthetic face image samples. A face-change detection model for detecting whether a face image is a face-change image obtained by exchanging facial features of two images may be constructed. And respectively taking the real face image sample and the synthesized face image sample as a positive sample and a negative sample to train the face-changing detection model. Therefore, a large number of labeling samples can be obtained rapidly, the sample labeling difficulty of the face-changing detection model is reduced, and the accuracy of the face-changing detection model can be improved.

Referring to fig. 5, as an implementation of the above method for generating a face image, the present disclosure provides an embodiment of an apparatus for generating a face image, where the embodiment of the apparatus corresponds to the embodiment of the above method, and the apparatus may be specifically applied to various electronic devices.

As shown in fig. 5, the apparatus 500 for generating a face image of the present embodiment includes: a first extraction unit 501, a second extraction unit 502, and a synthesis unit 503. Wherein the first extraction unit 501 is configured to input the first face image into the trained texture feature generation network, and generate the texture features of the first face image; the second extraction unit 502 is configured to input the second face image into a trained face recognition network to perform identity feature extraction, so as to obtain identity features of the second face image; the synthesizing unit 503 is configured to splice the texture feature of the first face image and the identity feature of the second face image to form a spliced feature, and decode the spliced feature by using a pre-trained decoder to obtain a synthesized face image in which the texture feature of the first face image and the identity feature of the second face image are fused.

In some embodiments, the texture feature generation network comprises a generator in a trained generation countermeasure network; the above-described discriminator in the generation countermeasure network is for discriminating: after the texture feature generation network and the face recognition network are respectively input into the test face image, the texture feature and the identity feature are extracted, and whether the face image obtained by decoding based on the feature obtained by splicing the texture feature and the identity feature of the test face image is consistent with the test face image or not.

The units in the above-described device 500 correspond to the steps in the method described with reference to fig. 2 to 4. Thus, the operations, features and technical effects that can be achieved by the method for generating a face image described above are equally applicable to the apparatus 500 and the units contained therein, and are not described herein again.

Referring now to fig. 6, a schematic diagram of an electronic device (e.g., the server shown in fig. 1) 600 suitable for use in implementing embodiments of the present disclosure is shown. The electronic device shown in fig. 6 is merely an example and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.

As shown in fig. 6, the electronic device 600 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 601, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

In general, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, a hard disk; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 shows an electronic device 600 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead. Each block shown in fig. 6 may represent one device or a plurality of devices as needed.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via communication means 609, or from storage means 608, or from ROM 602. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing means 601. It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In an embodiment of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. Whereas in embodiments of the present disclosure, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: inputting the first face image into a trained texture feature generation network to generate texture features of the first face image; inputting the second face image into a trained face recognition network for identity feature extraction to obtain the identity feature of the second face image; and splicing the texture features of the first face image and the identity features of the second face image to form spliced features, and decoding the spliced features by using a pre-trained decoder to obtain a synthesized face image fused with the texture features of the first face image and the identity features of the second face image.

Computer program code for carrying out operations of embodiments of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments described in the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The described units may also be provided in a processor, for example, described as: a processor includes a first extraction unit, a second extraction unit, and a synthesis unit. The names of these units do not constitute a limitation on the unit itself in some cases, and for example, the first extraction unit may also be described as "a unit that inputs the first face image into the trained texture feature generation network, and generates the texture feature of the first face image".

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention referred to in this disclosure is not limited to the specific combination of features described above, but encompasses other embodiments in which features described above or their equivalents may be combined in any way without departing from the spirit of the invention. Such as the above-described features and technical features having similar functions (but not limited to) disclosed in the present application are replaced with each other.

Claims

1. A method of generating a face image, comprising:

inputting the first face image into a trained texture feature generation network to generate texture features of the first face image;

inputting the second face image into a trained face recognition network for identity feature extraction to obtain the identity feature of the second face image;

splicing the texture features of the first face image and the identity features of the second face image to form spliced features, and decoding the spliced features by using a pre-trained decoder to obtain a synthesized face image fused with the texture features of the first face image and the identity features of the second face image;

Wherein the texture feature generation network comprises a generator in a trained generation countermeasure network;

the discriminator in the generated countermeasure network is used for discriminating: and respectively inputting the test face image into the texture feature generation network and the face recognition network to extract texture features and identity features, and then decoding the feature obtained by splicing the texture features and the identity features of the test face image to obtain a face image which is consistent with the test face image.

2. The method of claim 1, wherein the method further comprises: training the texture feature generation network and the decoder based on a first set of sample face images;

the training the texture feature generation network and the decoder based on the first sample face image set comprises:

taking a generator in a preset generation countermeasure network as a texture feature generation network to be trained, and acquiring a decoder to be trained;

inputting a first sample face image in the first sample face image set to the texture feature generation network to be trained to obtain texture features of the first sample face image;

inputting the first sample face image into a trained face recognition network for identity feature extraction to obtain the identity feature of the first sample face image;

Splicing the texture features and the identity features of the first sample face image to obtain fusion features of the first sample face image;

decoding the fusion characteristics of the first sample face image by using a decoder to be trained to obtain a predicted face image corresponding to the first sample face image;

and judging the first sample face image and the corresponding predicted face image by utilizing a judging device in the preset generating countermeasure network, and iteratively adjusting parameters of the texture feature generating network to be trained, the judging device and the decoder to be trained according to the judging error of the judging device.

3. The method according to claim 1 or 2, wherein the method further comprises:

training the face recognition network based on a second sample face image set, wherein a second sample face image in the second sample face image set comprises identity information of a corresponding face;

the training the face recognition network based on the second sample face image set comprises the following steps:

and monitoring the training of the face recognition network by adopting a pre-constructed loss function, wherein the value of the loss function is inversely related to the value of the face image capacity of the face recognition network for distinguishing different identity information.

4. The method according to claim 1 or 2, wherein the method further comprises:

constructing a synthetic face image set based on the synthetic face image;

and training a face-changing monitoring model by utilizing the synthesized face image set.

5. An apparatus for generating a face image, comprising:

a first extraction unit configured to input a first face image into a trained texture feature generation network, generating texture features of the first face image;

the second extraction unit is configured to input a second face image into the trained face recognition network to extract the identity characteristics, so as to obtain the identity characteristics of the second face image;

the synthesizing unit is configured to splice the texture features of the first face image and the identity features of the second face image to form spliced features, and a pre-trained decoder is utilized to decode the spliced features to obtain a synthesized face image fused with the texture features of the first face image and the identity features of the second face image;

6. The apparatus of claim 5, wherein the apparatus further comprises:

a first training unit configured to train the texture feature generation network and the decoder based on the first sample face image set in the following manner:

7. The apparatus of claim 5 or 6, wherein the apparatus further comprises:

a second training unit configured to train the face recognition network based on a second sample face image set, wherein a second sample face image in the second sample face image set includes identity information of a corresponding face;

the second training unit is configured to train the face recognition network based on the second sample face image set in the following manner:

8. The apparatus of claim 5 or 6, wherein the apparatus further comprises:

a construction unit configured to construct a synthetic face image set based on the synthetic face image;

and a third training unit configured to train a face-changing monitoring model by using the synthesized face image set.

9. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs,

when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-4.

10. A computer readable medium having stored thereon a computer program, wherein the program when executed by a processor implements the method of any of claims 1-4.