CN111275784A

CN111275784A - Method and device for generating image

Info

Publication number: CN111275784A
Application number: CN202010065575.7A
Authority: CN
Inventors: 李鑫; 刘霄; 张赫男; 孙昊; 李甫; 何栋梁; 周志超; 龙翔; 王平; 文石磊; 丁二锐
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-01-20
Filing date: 2020-01-20
Publication date: 2020-06-12
Anticipated expiration: 2040-01-20
Also published as: CN111275784B

Abstract

The embodiment of the disclosure discloses a method and a device for generating an image. The method comprises the following steps: acquiring a first image comprising a first face; inputting the first image into a pre-trained generative confrontation network to obtain a second image which is output by the generative confrontation network and comprises a second face; wherein the generative confrontation network takes face attribute information generated based on the input image as a constraint. The method can improve the accuracy and efficiency of the generation type countermeasure network for generating the second image based on the input first image, reduce the probability of generating the image by mistake and reduce the constraint that the input image can only be the image in the preset area.

Description

Method and device for generating image

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to the field of image conversion technologies, and in particular, to a method and an apparatus for generating an image.

Background

At present, a plurality of self-shooting special effect playing methods such as changing children and changing faces have great interest in the market, and a plurality of users like playing or even have phenomenon level Application (APP) of flaming over one night.

The existing technical scheme has two types: one is a photo caricaturing method (cartoongan) based on a generative confrontation network, which is similar to the addition of a cartoon filter to an input image and is essentially incapable of generating a useful cartoon character. The other is a recently published paper-unsupervised generation attention network (UGATIT) algorithm, but in practical use, it is found that the generation rate of cartoon characters is low, many bad cases (such as eyes appearing on clothes, many textures on face) or no generation at all are found, and there is a strict limit to the input image (square, only front face image can be input), and the existing method can only generate girls.

Disclosure of Invention

The embodiment of the disclosure provides a method and a device for generating an image.

In a first aspect, an embodiment of the present disclosure provides a method for generating an image, including: acquiring a first image comprising a first face; inputting the first image into a pre-trained generative confrontation network to obtain a second image which is output by the generative confrontation network and comprises a second face; wherein the generative confrontation network takes face attribute information generated based on the input image as a constraint.

In some embodiments, inputting the first image into a pre-trained generative confrontation network comprises: carrying out Gaussian blur of different degrees on the first image, and inputting the first image after the Gaussian blur of different degrees into a pre-trained generative confrontation network; or detecting whether the texture characteristic parameter value of the first image is larger than a texture threshold value, if so, performing Gaussian blur of different degrees on the first image, and inputting the first image after the Gaussian blur of different degrees into a pre-trained generative confrontation network.

In some embodiments, generating the confrontation network to use the face attribute information generated based on the input image as a constraint comprises: the generative confrontation network takes the face key points and/or the face semantic segmentation result generated based on the input image as constraints.

In some embodiments, the generative confrontation network using the face keypoints and/or face semantic segmentation results generated based on the input image as constraints comprises: the generative confrontation network takes a multi-channel face image generated based on an input image as an input; the multichannel face image comprises RGB three channels of an input image and at least one of the following items of the input image: a first binary image channel or a RGB three channel of the face key points; a binary image one channel or an RGB three channel of the human face semantic segmentation result; and a binary map of hair-channel.

In some embodiments, pre-trained generative confrontation networks are used to train input image samples comprising: and carrying out data enhancement on the initial image sample to obtain the image sample.

In some embodiments, data enhancing the initial image sample comprises: performing at least one of the following operations on the initial image sample: rotation, flipping, zooming, and different degrees of gaussian blur.

In some embodiments, the generative confrontation network includes at least any one of: the system comprises a generating type confrontation network GAN, a cycle coincidence generating type confrontation network cycleGAN, a human face high-precision attribute editing model AttGAN, a star generating type confrontation network StarGAN and a space transformer generating type confrontation network STGAN.

In some embodiments, the first image is a real face image; the second image is a cartoon image.

In a second aspect, an embodiment of the present disclosure provides an apparatus for generating an image, including: an acquisition unit configured to acquire a first image including a first face; the generating unit is configured to input the first image into a pre-trained generative confrontation network to obtain a second image which comprises a second face and is output by the generative confrontation network; wherein the generative confrontation network takes face attribute information generated based on the input image as a constraint.

In some embodiments, the generating unit is further configured to: carrying out Gaussian blur of different degrees on the first image, and inputting the first image after the Gaussian blur of different degrees into a pre-trained generative confrontation network; or detecting whether the texture characteristic parameter value of the first image is larger than a texture threshold value, if so, performing Gaussian blur of different degrees on the first image, and inputting the first image after the Gaussian blur of different degrees into a pre-trained generative confrontation network.

In some embodiments, the generative confrontation network employed in the generation unit having the face attribute information generated based on the input image as a constraint comprises: the generative confrontation network employed in the generation unit takes face keypoints and/or face semantic segmentation results generated based on the input image as constraints.

In some embodiments, the generative confrontation network employed in the generation unit includes, as constraints, face keypoints and/or face semantic segmentation results generated based on the input image: the generative confrontation network adopted in the generating unit takes a multi-channel face image generated based on an input image as an input; the multichannel face image comprises RGB three channels of an input image and at least one of the following items of the input image: a first binary image channel or a RGB three channel of the face key points; a binary image one channel or an RGB three channel of the human face semantic segmentation result; and a binary map of hair-channel.

In some embodiments, generating the input image samples for training of the pre-trained generative confrontation network employed in the unit comprises: and carrying out data enhancement on the initial image sample to obtain the image sample.

In some embodiments, generating the input image samples for training of the pre-trained generative confrontation network employed in the unit comprises: performing at least one of the following operations on the initial image sample: rotation, flipping, zooming, and different degrees of gaussian blur.

In a third aspect, an embodiment of the present disclosure provides an electronic device/terminal/server, including: one or more processors; storage means for storing one or more programs; when executed by one or more processors, cause the one or more processors to implement a method of generating an image as described in any above.

In a fourth aspect, the disclosed embodiments provide a computer readable medium, on which a computer program is stored, which when executed by a processor, implements the method of generating an image as described in any of the above.

The method and the device for generating the image provided by the embodiment of the disclosure firstly obtain a first image comprising a first face; inputting the first image into a pre-trained generative confrontation network to obtain a second image which is output by the generative confrontation network and comprises a second face; wherein the generative confrontation network takes face attribute information generated based on the input image as a constraint. In this process, the accuracy and efficiency of the generation countermeasure network for generating the second image based on the input first image can be improved, the probability of generating the image by mistake is reduced, and the constraint that the input image can only be the image in the predetermined area is reduced.

Drawings

Other features, objects, and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which the present disclosure may be applied;

FIG. 2 is a schematic flow chart diagram illustrating one embodiment of a method of generating an image in accordance with an embodiment of the present disclosure;

FIG. 3 is an exemplary application scenario of a method of generating an image according to an embodiment of the present disclosure;

FIG. 4 is a schematic flow chart diagram of yet another embodiment of a method of generating an image in accordance with an embodiment of the present disclosure;

FIG. 5a is an exemplary diagram of a first image of an input pre-trained recurrent consensus generative countermeasure network in yet another embodiment of a method of generating images according to an embodiment of the present disclosure;

fig. 5b is a face keypoint image represented by RGB three channels extracted based on the first image in fig. 5a according to an embodiment of the present disclosure;

FIG. 5c is an image of a segmentation result of a hair represented in a binary image-one channel extracted based on the first image in FIG. 5a, according to an embodiment of the present disclosure;

FIG. 5d is a face semantic segmentation result image expressed by RGB three channels extracted based on the first image in FIG. 5a according to an embodiment of the present disclosure;

FIG. 6 is a schematic structural diagram of a pre-trained cyclic consensus generative countermeasure network of a method of generating images according to an embodiment of the present disclosure;

FIG. 7 is an exemplary block diagram of one embodiment of an apparatus for generating an image of the present disclosure;

FIG. 8 is a schematic block diagram of a computer system suitable for use with a server embodying embodiments of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments. Those skilled in the art will also appreciate that although the terms "first," "second," etc. may be used herein to describe various faces, images, etc., these faces, images, etc. should not be limited by these terms. These terms are used only to distinguish one face, image from other faces, images.

Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the image generating method or image generating apparatus of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have various communication client applications installed thereon, such as a browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices that support browser applications, including but not limited to tablet computers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented, for example, as multiple software or software modules to provide distributed services, or as a single software or software module. And is not particularly limited herein.

The server 105 may be a server providing various services, such as a background server providing support for browser applications running on the

terminal devices

101, 102, 103. The background server can analyze and process the received data such as the request and feed back the processing result to the terminal equipment.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules, for example, to provide distributed services, or as a single piece of software or software module. And is not particularly limited herein.

In practice, the method for generating the image provided by the embodiment of the present disclosure may be executed by the

terminal devices

101, 102, 103 and/or the

servers

105, 106, and the apparatus for generating the image may also be disposed in the

terminal devices

101, 102, 103 and/or the

servers

105, 106.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to fig. 2, fig. 2 illustrates a flow 200 of one embodiment of a method of generating an image according to the present disclosure. The method of generating an image comprises the steps of:

step 201, a first image including a first face is acquired.

In this embodiment, an execution subject (for example, a terminal or a server shown in fig. 1) of the method for generating an image may acquire the first image including the first face from a local or remote album or a database, or may acquire the first image including the first face via a local or remote photographing service.

Step 202, inputting the first image into a pre-trained generative confrontation network to obtain a second image which is output by the generative confrontation network and comprises a second face; wherein the generative confrontation network takes the face attribute information generated based on the first image as a constraint.

In this embodiment, the pre-trained Generative confrontation network refers to a deep learning model developed based on a Generative Adaptive Network (GAN). For example, the generative confrontation network GAN, the cyclic coincidence generative confrontation network CycleGan, the human face high-precision attribute editing model AttGAN, the star generative confrontation network StarGAN, the space transformer generative confrontation network STGAN, the generative confrontation networks DualGAN and DiscoGAN for dual learning, and the like.

The pre-trained generative confrontation network generally includes a generator g (generator) and a Discriminator (Discriminator). There are two data fields, X, Y. G is responsible for emulating the data in the X domain into real data and hiding them in the real data, while D is responsible for separating the forged data from the real data. After the two games, the counterfeiting technology of G is more and more severe, and the authentication technology of D is also more and more severe. Until D can no longer tell whether the data is real or G generated, the process of fighting reaches a dynamic equilibrium.

Training a generative confrontation network requires two loss functions: a reconstruction loss function of the generator and a discriminant loss function of the discriminator. The reconstruction loss function is used for determining whether the generated picture is similar to the original picture as much as possible; the discriminant loss function is used for inputting the generated false picture and the original true picture into a discriminant to obtain the two-classification loss with a formula of 0 and 1.

The generator is composed of an encoder, a converter and a decoder. The encoder uses a convolutional neural network to extract features from the input image. For example, the image is compressed into 256 feature vectors of 64 x 64. The converter converts the feature vectors of the image in the DA domain to the feature vectors in the DB domain by combining the dissimilar features of the image. For example, 6-layer Reset modules, each of which is a neural network layer composed of two convolutional layers, can be used to achieve the goal of preserving the original image features while converting. The decoder uses the deconvolution layer (deconvolution) to complete the work of restoring low-level features from the feature vector, and finally, the generated image is obtained.

The discriminator takes an image as input and tries to predict whether it is the original image or the output image of the generator. The discriminator itself belongs to a convolutional network, and it is necessary to extract features from the image and then determine whether the extracted features belong to a particular class by adding a convolutional layer that produces a one-dimensional output.

The above-described pre-trained generative confrontation network uses face attribute information generated based on an input image as a constraint. The face attribute information is a series of biological characteristic information representing the face characteristics, has strong self stability and individual difference, and identifies the identity of a person. Including gender, skin color, age, expression, etc.

The generative confrontation network may be implemented by using a method for constraint of a machine learning network in the prior art or a technology developed in the future when face attribute information generated based on an input image (for example, a first image input when the generative confrontation network is applied or a first image sample input when the generative confrontation network is trained) is used as the constraint, which is not limited in the present application.

In a specific example of the present application, the face attribute information generated based on the input image may be used as an input of any one or more layers of the generator network in the generative countermeasure network, and be combined with an original input of the layer network to serve as an input, so as to improve the relevance of the output image of the generator network, which introduces the constraint, and the face attribute information in machine learning.

In another specific example of the present application, the above-mentioned face attribute information generated based on the input image, using facial marker features to define consistency loss, instructs to train the discriminator in the generative confrontation network. First, the generator generates a rough second image based on the input first image. Subsequently, the generative confrontation network generates a pre-training regression (regressor) based on the generated second image to predict the facial landmark, and marks the key points of the face. And finally, thinning the face characteristics corresponding to the first image in the second image through a local discriminator and a global discriminator. At this stage, the consistency of landmark is emphasized, so that the final result is realistic and discernable.

As will be appreciated by those skilled in the art, the generative confrontation network uses face attribute information generated based on the input images as constraints. Wherein, when training the generative confrontation network, the input image may be the first input image sample. The generative confrontation network extracts the face attribute information of the input first image sample, and the face attribute information is used as the constraint of the generative confrontation network to obtain an output generated image. In applying the generative confrontation network, the input image may be an input first image including a first face. The generative confrontation network extracts the face attribute information of the input first image comprising the first face, and takes the face attribute information of the first image as the constraint of the generative confrontation network to obtain the output second image comprising the second face.

The first image input into the generated countermeasure network may be an image including a human face. The second image generated as an output of the confrontational network may be an image of a different style or gender from the first image and including a human face.

In some optional implementations of this embodiment, the generating confrontation network using the face attribute information generated based on the input image as a constraint includes: the generative confrontation network takes the face key points and/or the face semantic segmentation result generated based on the input image as constraints.

In this implementation manner, the execution subject may adopt a face key point extraction technique to extract a face key point of the input image, and use the face key point as a constraint when the generation-type countermeasure network generates the output image. Alternatively or additionally, the execution subject may employ a face semantic segmentation technique, with the face semantic segmentation result as a constraint in the generation-based countermeasure network to generate the output image.

The generation type countermeasure network in the implementation mode adopts the face key points and/or the face semantic segmentation result generated based on the input image as the constraint, so that the generator can link the five sense organs of the input image and the five sense organs of the output image, the face features cannot be generated at other parts by mistake, the input image can be a larger image comprising a face and is not limited to only the input face image, and the accuracy and the quality of the output image are improved.

In some optional implementations of this embodiment, inputting the first image into a pre-trained generative confrontation network includes: and performing Gaussian blur of different degrees on the first image, and inputting the first image after the Gaussian blur of different degrees into a pre-trained generative confrontation network.

In this implementation, Gaussian Blur (also called Gaussian smoothing) can reduce image noise and detail level, and enhance the image effect of the image under different scale sizes (see scale space representation and scale space implementation). From the mathematical point of view, the gaussian blurring process of an image is to convolute the image with a normal distribution. Since a normal distribution is also called a gaussian distribution, this technique is also called gaussian blurring.

The first images after the Gaussian blur of different degrees are used as the input of the generation type countermeasure network, the second images with different definitions can be obtained, and therefore the required clear second images can be determined on the basis of the second images with different definitions.

Alternatively or additionally, inputting the first image into a pre-trained generative confrontation network comprises: and detecting whether the texture characteristic parameter value of the first image is larger than a texture threshold value, if so, performing Gaussian blur of different degrees on the first image, and inputting the first image subjected to Gaussian blur of different degrees into a pre-trained generative confrontation network.

Here, the texture feature parameter value of the image refers to a parameter value for characterizing a texture feature of the image. Such as the thickness, density, and directionality of the texture. When the texture feature parameter value of the first image is detected to be larger than the texture threshold value, the texture of the first image is indicated to be complex. In general, image content generated based on images with complex textures is messy. Therefore, the first images can be subjected to different degrees of gaussian blur, so that the second images are respectively generated for the first images subjected to different degrees of gaussian blur, and the second images with different definitions are obtained. Thereafter, a desired sharp second image may be determined on the basis of the second images of different sharpness, thereby improving the quality of the generated image.

The pre-trained generative confrontation network can adopt a crawler to crawl image samples from the network or directly acquire an image sample data set, and each image in the images comprises a human face; then, the image sample obtained or crawled can be directly used as the image sample of the training generation type countermeasure network; or, the crawled image including the human face is used as an initial image sample, the initial image sample is further subjected to data processing to obtain a screened image meeting the requirements of the image sample, and the screened image is used as an image sample of the training generation type countermeasure network.

In some optional implementations of this embodiment, the pre-trained generative confrontation network is used for training the input image samples including: and carrying out data enhancement on the initial image sample to obtain the image sample.

In this implementation, the data enhancement may include operations such as rotation, translation, flipping, and the like. The pre-trained generative confrontation network needs image samples with consistent styles, different sexes, different angles and different face sizes according to training data, so that data enhancement can be performed on initial image samples, the training data volume is increased, the generalization capability of the generative confrontation network is improved, noise data is increased, and the robustness of the generative confrontation network is improved.

In some optional implementations of the present embodiment, the data enhancing the initial image sample comprises: performing at least one of the following operations on the initial image sample: rotation, flipping, zooming, and different degrees of gaussian blur.

In the implementation mode, one or more operations of rotation, turning, scaling and Gaussian blur of different degrees are carried out on the initial image sample, so that the generalization capability and robustness of the trained generative confrontation network can be improved.

In the method for generating an image according to the above embodiment of the disclosure, since the face attribute information of the first image is used as the constraint of the generative countermeasure network in the process of generating the second image based on the first image by the generative countermeasure network, the quality, accuracy and efficiency of generating the second image based on the input first image by the generative countermeasure network can be improved, the probability of generating an image by mistake is reduced, and the constraint that the input image can only be an image in a predetermined area is reduced.

An exemplary application scenario of the method of generating an image of the present disclosure is described below in conjunction with fig. 3.

As shown in fig. 3, fig. 3 illustrates one exemplary application scenario of the method of generating an image according to the present disclosure.

As shown in fig. 3, a method 300 of generating an image operates in an electronic device 310 and may include:

firstly, acquiring a first image 302 comprising a first face 301;

then, based on the face 301 in the first image 302, face attribute information 303 is generated;

finally, the first image 302 is input into a pre-trained generative confrontation network 304, and the face attribute information 303 is used as the constraint of the generative confrontation network 304, so as to obtain a second image 306 including a second face 305 output by the generative confrontation network 304.

It should be understood that the application scenario of the method for generating an image shown in fig. 3 is only an exemplary description of the method for generating an image, and does not represent a limitation on the method. For example, the steps shown in fig. 3 above may be implemented in further detail. Other steps for generating an image may be further added to the above-described fig. 3.

With further reference to fig. 4, fig. 4 shows a schematic flow chart diagram of one embodiment of a method of uploading files in a method of generating images according to the present disclosure.

As shown in fig. 4, a method 400 for uploading a file in a method for generating an image according to this embodiment may include the following steps:

step 401, a first image including a first face is acquired.

Step 402, acquiring an RGB three-channel image of the face key point of the first image based on the first image.

In this embodiment, a face keypoint extraction technique in the prior art or a future developed technique may be adopted to obtain a face keypoint image of the first image based on the first image, which is not limited in this application. For example, face keypoint extraction may be performed using an Active Shape Model (ASM), an Active appearance Model (AMM), a cascade Shape Regression Model (CSR), a face Alignment algorithm (DAN), or the like.

Step 403, acquiring a binary image of the hair of the first image based on the first image.

In this embodiment, a hair segmentation technique in the prior art or a future developed technique may be adopted to obtain a binary image of the hair of the first image based on the first image, which is not limited in this application. For example, a technique of segmenting hair may be used to obtain a hair segmentation result, and then convert the hair segmentation result into a binary image of hair.

And step 404, splicing the RGB three-channel image of the first image, the RGB three-channel image of the key points of the human face and the binary image of the hair into a multi-channel human face image.

In this embodiment, the multi-channel face image is a face image including three RGB channels of the first image and the following items of the first image: the face key point RGB three channels and the hair binary image one channel.

The three channels of RGB of the image mean that each pixel point is represented by three values of RGB, and the three channels are respectively: red channel r (red), green channel g (green), blue channel b (blue).

The RGB three channels of the face key points of the first image are obtained by extracting the face key points from the first image, and each pixel point of the face key point feature map is represented by the RGB three channels. In a specific example, the first input image is shown in fig. 5a, and the face key points represented by RGB three channels as shown in fig. 5b can be extracted.

The first channel of the binary image of the hair of the input image is a binary image of the hair obtained by adopting a hair segmentation technology, and each pixel point of a hair segmentation result is represented by the first channel of the binary image. The binary image means that there are only two gray levels in the image, that is, the gray value of any pixel in the image is 0 or 255, which represents black and white respectively. In a specific example, the input image is shown in fig. 5a, and the result of the segmentation of the hair using the binary image-channel representation shown in fig. 5c can be obtained.

Step 405, inputting the multi-channel face image into a pre-trained cyclic consistent generating type confrontation network to obtain a second image which is output by the pre-trained cyclic consistent generating type confrontation network and comprises a second face.

In this embodiment, the multi-channel face image may be input into a pre-trained cyclic consistent generation type confrontation network (CycleGan), so as to obtain a second image output by the cyclic consistent generation type confrontation network.

In a specific example, the first image is a real face image; the second image is a cartoon image. The cartoon image can be a sketch or a base image of a wall painting, an oil painting, a carpet and the like, and can also be a cartoon, a ironic painting, a humorous painting and the like.

It can be understood by those skilled in the art that the multi-channel face image samples adopted when the cyclic generated confrontation network is trained in advance are adapted to the multi-channel face images input when the cyclic generated confrontation network is applied, so as to obtain the trained cyclic generated confrontation network suitable for the multi-channel face images input when the cyclic generated confrontation network is applied.

In one specific example, the structure of a pre-trained cyclic consensus generative countermeasure network is shown in fig. 6, comprising two generators G _ A2B, G _ B2A, and two discriminators D _ a and D _ B. Wherein the generator G _ A2B is for generating cartoons based on input images of real persons, and the generator G _ B2A is for generating real persons based on images of cartoons. The discriminator D _ A is used for judging whether the image is a real face image according to the output result of (0, 1), and the discriminator D _ B is used for judging whether the image is a real cartoon image according to the output result of (0, 1).

During training, the combined network of the two groups of generators and the discriminator are alternately trained, and after training is finished, a real person is used for getting on the G _ A2B generator of the cartoon.

It will be understood by those skilled in the art that the embodiment shown in fig. 4 is only an exemplary embodiment of the present application and does not represent a limitation of the present application.

For example, the pre-trained cyclic consensus generative countermeasure network described above may be another generative countermeasure network based on GAN improvement.

For another example, the multi-channel face image in fig. 4 may further include three channels of face semantic segmentation result RGB of the first image on the basis of three channels of RGB including the first image.

The three RGB channels of the face semantic segmentation result of the first image are obtained by extracting the face semantic segmentation result from the input image, and each pixel point of the face semantic segmentation result is represented by the three RGB channels. In a specific example, the input image is fig. 5a, and a face semantic segmentation result expressed by RGB three channels as shown in fig. 5d can be extracted.

Here, the semantic segmentation of the face of the first image is an extension of the target detection, and the output of the semantic segmentation is a color mask of the target distinguished by categories, so that the target can be more accurately positioned and is not influenced by the complex shape of the target.

When the face semantic segmentation is performed, a face semantic segmentation result of the first image may be obtained based on the first image by using a face semantic segmentation technology in the prior art or a future developed technology, which is not limited in the present application. In some specific examples, the technique of segmenting the face semantics may employ a full convolution neural network fcn (full probabilistic networks for semantic segmentation), a semantic segmentation network (SegNet, semantic segmentation), a hole convolution (scaled relationships), semantic segmentation (deep lab (V1, V2, V3, etc.)), an image segmentation model (RefineNet), a pyramid scene parsing network (PSPNet), and the like to obtain the face semantics segmentation result of the first image based on the first image.

For another example, the multi-channel face image in fig. 4 may further include a binary image-channel of the face semantic segmentation result on the basis of three RGB channels including the first image. Alternatively, the multi-channel face image in fig. 4 may further include a binary image one channel of face key points of the first image and a binary image one channel of hair on the basis of RGB three channels including the first image. Alternatively, the multi-channel face image in fig. 4 may further include three channels of face key points RGB of the first image and a binary image one channel of the face semantic segmentation result on the basis of three channels of RGB including the first image. And will not be described in detail herein.

The method for generating an image in the embodiment shown in fig. 4 of the present disclosure further discloses that, on the basis of the method for generating an image shown in fig. 2, a multi-channel face image is input into a pre-trained cyclic consensus generated confrontation network, and a second image including a second face output by the pre-trained cyclic consensus generated confrontation network is obtained. In the process, the multichannel face image is used as the reference for generating the second image, so that the accuracy and the quality of the generated second image are improved.

As an implementation of the method shown in the above figures, the embodiment of the present disclosure provides an embodiment of an apparatus for generating an image, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2 to 6, and the apparatus may be specifically applied to the terminal or the server shown in fig. 1.

As shown in fig. 7, the apparatus 700 for generating an image of the present embodiment may include: an acquisition unit 710 configured to acquire a first image including a first face; a generating unit 720, configured to input the first image into a pre-trained generative confrontation network, resulting in a second image comprising a second face output by the generative confrontation network; wherein the generative confrontation network takes face attribute information generated based on the input image as a constraint.

In some embodiments, the generating unit 720 is further configured to: carrying out Gaussian blur of different degrees on the first image, and inputting the first image after the Gaussian blur of different degrees into a pre-trained generative confrontation network; or detecting whether the texture characteristic parameter value of the first image is larger than a texture threshold value, if so, performing Gaussian blur of different degrees on the first image, and inputting the first image after the Gaussian blur of different degrees into a pre-trained generative confrontation network.

In some embodiments, the generative confrontation network employed in the generation unit 720 includes, as constraints, face attribute information generated based on the input image: the generative confrontation network employed in the generation unit takes face keypoints and/or face semantic segmentation results generated based on the input image as constraints.

In some embodiments, the generative confrontation network employed in the generation unit 720 takes the face keypoints and/or face semantic segmentation results generated based on the input image as constraints including: the generative confrontation network adopted in the generating unit takes a multi-channel face image generated based on an input image as an input; the multichannel face image comprises RGB three channels of an input image and at least one of the following items of the input image: a first binary image channel or a RGB three channel of the face key points; a binary image one channel or an RGB three channel of the human face semantic segmentation result; and a binary map of hair-channel.

In some embodiments, the pre-trained generative confrontation network employed in the generation unit 720 includes, for training, input image samples: and carrying out data enhancement on the initial image sample to obtain the image sample.

In some embodiments, the pre-trained generative confrontation network employed in the generation unit 720 includes, for training, input image samples: performing at least one of the following operations on the initial image sample: rotation, flipping, zooming, and different degrees of gaussian blur.

It should be understood that the various elements recited in the apparatus 700 correspond to various steps recited in the methods described with reference to fig. 2-5. Thus, the operations and features described above for the method are equally applicable to the apparatus 700 and the various units included therein, and are not described in detail here.

Referring now to fig. 8, a schematic diagram of an electronic device (e.g., a server or terminal device of fig. 1) 800 suitable for use in implementing embodiments of the present disclosure is shown. Terminal devices in embodiments of the present disclosure may include, but are not limited to, devices such as notebook computers, desktop computers, and the like. The terminal device/server shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 8, an electronic device 800 may include a processing means (e.g., central processing unit, graphics processor, etc.) 801 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage means 808 into a Random Access Memory (RAM) 803. In the RAM803, various programs and data necessary for the operation of the electronic apparatus 800 are also stored. The processing apparatus 801, the ROM 802, and the RAM803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

Generally, the following devices may be connected to the I/O interface 805: input devices 806 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 807 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage 808 including, for example, magnetic tape, hard disk, etc.; and a communication device 809. The communication means 809 may allow the electronic device 800 to communicate wirelessly or by wire with other devices to exchange data. While fig. 8 illustrates an electronic device 800 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 8 may represent one device or may represent multiple devices as desired.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication means 809, or installed from the storage means 808, or installed from the ROM 802. The computer program, when executed by the processing apparatus 801, performs the above-described functions defined in the methods of the embodiments of the present disclosure. It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a first image comprising a first face; inputting the first image into a pre-trained generative confrontation network to obtain a second image which is output by the generative confrontation network and comprises a second face; wherein the generative confrontation network takes face attribute information generated based on the input image as a constraint.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit and a generation unit. Where the names of the cells do not in some cases constitute a limitation of the cell itself, for example, the acquisition cell may also be described as a "cell acquiring a first image comprising a first face".

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept as defined above. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims

1. A method of generating an image, comprising:

acquiring a first image comprising a first face;

inputting the first image into a pre-trained generative confrontation network to obtain a second image which is output by the generative confrontation network and comprises a second face; wherein the generative confrontation network takes face attribute information generated based on the input image as a constraint.

2. The method of claim 1, wherein said inputting the first image into a pre-trained generative confrontation network comprises:

carrying out Gaussian blur of different degrees on the first image, and inputting the first image after the Gaussian blur of different degrees into a pre-trained generative confrontation network; or

And detecting whether the texture characteristic parameter value of the first image is larger than a texture threshold value, if so, performing Gaussian blur of different degrees on the first image, and inputting the first image subjected to Gaussian blur of different degrees into a pre-trained generative confrontation network.

3. The method of any one of claims 1 or 2, wherein the generative confrontation network constraining the generated face attribute information based on the input image comprises: the generative confrontation network takes the face key points and/or the face semantic segmentation result generated based on the input image as constraints.

4. The method of claim 3, wherein the generative confrontation network using face keypoints and/or face semantic segmentation results generated based on the input image as constraints comprises:

the generative confrontation network takes a multi-channel face image generated based on an input image as an input; the multichannel face image comprises RGB three channels of an input image and at least one of the following items of the input image:

a first binary image channel or a RGB three channel of the face key points;

a binary image one channel or an RGB three channel of the human face semantic segmentation result; and

the binary image of the hair — the channel.

5. The method of claim 1, wherein the pre-trained generative confrontation network uses input image samples for training comprising:

and carrying out data enhancement on the initial image sample to obtain the image sample.

6. The method of claim 5, wherein the data enhancing the initial image sample comprises:

performing at least one of the following operations on the initial image sample: rotation, flipping, zooming, and different degrees of gaussian blur.

7. The method of claim 1, wherein the generative countermeasure network comprises at least any one of: the system comprises a generating type confrontation network GAN, a cycle coincidence generating type confrontation network cycleGAN, a human face high-precision attribute editing model AttGAN, a star generating type confrontation network StarGAN and a space transformer generating type confrontation network STGAN.

8. The method according to any one of claims 1-7, wherein the first image is a real face image; the second image is a cartoon image.

9. An apparatus for generating an image, comprising:

an acquisition unit configured to acquire a first image including a first face;

the generating unit is configured to input the first image into a pre-trained generative confrontation network, and obtain a second image which comprises a second face and is output by the generative confrontation network; wherein the generative confrontation network takes face attribute information generated based on the input image as a constraint.

10. The apparatus of claim 9, wherein the generating unit is further configured to:

11. The apparatus according to any one of claims 9 or 10, wherein the generative confrontation network employed in the generating means having, as constraints, face attribute information generated based on the input image comprises: the generative confrontation network adopted in the generating unit takes the human face key points and/or the human face semantic segmentation results generated based on the input images as constraints.

12. The apparatus of claim 11, wherein the generative confrontation network employed in the generating unit takes the face keypoints and/or face semantic segmentation results generated based on the input image as constraints comprises:

the generation type countermeasure network adopted in the generation unit takes a multi-channel face image generated based on an input image as an input; the multichannel face image comprises RGB three channels of an input image and at least one of the following items of the input image:

a first binary image channel or a RGB three channel of the face key points;

the binary image of the hair — the channel.

13. The apparatus of claim 9, wherein the pre-trained generative confrontation network employed in the generating unit is to use the input image samples for training comprises:

14. The apparatus of claim 13, wherein the pre-trained generative confrontation network employed in the generating unit is to use the input image samples for training comprises:

15. The apparatus of claim 9, wherein the generative countermeasure network comprises at least any one of: the system comprises a generating type confrontation network GAN, a cycle coincidence generating type confrontation network cycleGAN, a human face high-precision attribute editing model AttGAN, a star generating type confrontation network StarGAN and a space transformer generating type confrontation network STGAN.

16. The apparatus according to any one of claims 9-15, wherein the first image is a real face image; the second image is a cartoon image.

17. An electronic device/terminal/server comprising:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-8.

18. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-8.