CN111145080A

CN111145080A - Training method of image generation model, image generation method and device

Info

Publication number: CN111145080A
Application number: CN201911215480.2A
Authority: CN
Inventors: 方轲
Original assignee: Reach Best Technology Co Ltd
Current assignee: Reach Best Technology Co Ltd
Priority date: 2019-12-02
Filing date: 2019-12-02
Publication date: 2020-05-12
Anticipated expiration: 2039-12-02
Also published as: CN111145080B

Abstract

The present disclosure relates to a training method of an image generation model, an image generation method and an image generation device, wherein the training method of the image generation model comprises the following steps: inputting a sample image into a model to be trained; extracting target characteristic information of the sample image, and fusing the target characteristic information and first characteristic information of a target object to generate a sample target image; down-sampling the sample image and the sample target image; inputting the sample image and the sample target image into a first type discriminator module, and inputting the down-sampled sample image and the sample target image into a second type discriminator module; and obtaining a trained image generation model when the first loss value of the image generator module, the second loss value of the first type discriminator module and the third loss value of the second type discriminator module meet a first preset condition. Therefore, the sample target image has the global structure information and the local detail information of the target image, the sample image is well restored by the sample target image, and the definition of the sample target image is high.

Description

Training method of image generation model, image generation method and device

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a training method for an image generation model, an image generation method, and an image generation device.

Background

With the rapid development of mobile terminals, mobile terminals play an increasingly important role in the life of users, and influence various aspects of people. Particularly, with the rise of multimedia such as images, music, videos and the like, a user interacts with a mobile terminal through various multimedia social applications, and the images are also a key point of attention of most multimedia social applications as a large important component of the multimedia.

The image processing is a very hot direction in the image field, for example, an image of a child is used to predict an image after the child becomes old, so that the image of the child can be changed into an image of an adult. However, in the related art, after the child image is changed into the adult image, the obtained adult image has low definition, the effect of the child to be changed into the adult is not obvious, and the converted adult is not like the child.

Disclosure of Invention

In order to solve the technical problems described in the background art, embodiments of the present disclosure provide a training method for an image generation model, an image generation method, and an image generation device, and a technical solution of the present disclosure is as follows:

according to a first aspect of the embodiments of the present disclosure, there is provided a training method for an image generation model, including:

inputting a sample image containing a target object into a model to be trained;

the image generator module of the model to be trained extracts the target characteristic information of the sample image, and performs fusion processing on the target characteristic information and the first characteristic information of the target object to generate a sample target image; the target characteristic information is as follows: characteristic information other than the first characteristic information, the first characteristic information being: feature information for characterizing an age feature, a gender feature, or a color feature of the target subject;

respectively performing down-sampling on the sample image and the sample target image to obtain a down-sampled sample image and a down-sampled sample target image;

inputting the sample image and the sample target image into a first type discriminator module of the model to be trained, and correspondingly inputting the down-sampled sample image and the down-sampled sample target image into a second type discriminator module of the model to be trained;

and when a first loss value corresponding to the image generator module, a second loss value corresponding to the first type of discriminator module and a third loss value corresponding to the second type of discriminator module meet a first preset condition, determining the model to be trained as a trained image generation model.

Optionally, the model to be trained further includes a label classifier module;

after the image generator module of the model to be trained extracts the target feature information of the sample image, the method further comprises:

inputting the target characteristic information into the tag classifier module to obtain a judgment result, wherein the judgment result is whether the target characteristic information comprises the first characteristic information;

if the judgment result is that the target characteristic information does not comprise the first characteristic information, determining the model to be trained as a trained image generation model; and if the judgment result is that the target characteristic information comprises the first characteristic information, continuing to train the model to be trained.

Optionally, after the target feature information and the first feature information of the target object are fused to generate a sample target image, the method further includes:

calculating a target structure similarity loss value of the sample image and the sample target image;

determining the model to be trained as a trained image generation model when the first loss value, the second loss value, the third loss value and the target structure similarity loss value meet a second preset condition; otherwise, continuing to train the model to be trained.

Optionally, the first preset condition is: the sum of the first loss value, the second loss value and the third loss value is less than a first preset loss value.

Optionally, the second preset condition is: the sum of the first loss value, the second loss value, the third loss value and the target structure similarity loss value is less than a second preset loss value.

According to a second aspect of an embodiment of the present disclosure, an image generation method includes:

acquiring an image to be processed including a target object, wherein the target object has second characteristic information, and the second characteristic information is as follows: first age characteristic information, first gender characteristic information or first color characteristic information of the target object;

inputting the image to be processed into the trained image generation model of the first aspect to obtain a target image with the third feature information, where the third feature information is second age feature information, second gender feature information, or second color feature information of the target object, and the type of the third feature information is the same as that of the second feature information.

Optionally, before the image to be processed is input into the trained model of the first aspect, the method further includes:

carrying out image segmentation pretreatment on the image to be processed to obtain a target object of the image to be processed;

the inputting the image to be processed into the trained image generation model according to the first aspect includes:

inputting the target object into the trained image generation model of the first aspect.

According to a third aspect of the embodiments of the present disclosure, there is provided a training apparatus for an image generation model, including:

a first image input unit configured to perform input of a sample image containing a target object into a model to be trained;

the triggering unit is configured to trigger an image generator module of the model to be trained to extract target characteristic information of the sample image, perform fusion processing on the target characteristic information and first characteristic information of the target object, and generate a sample target image; the target characteristic information is as follows: characteristic information other than the first characteristic information, the first characteristic information being: feature information for characterizing an age feature, a gender feature, or a color feature of the target subject;

a down-sampling unit configured to perform down-sampling of the sample image and the sample target image, respectively, to obtain a down-sampled sample image and a down-sampled sample target image;

the second image input unit is configured to execute a first type discriminator module for inputting the sample image and the sample target image into the model to be trained, and correspondingly input the downsampled sample image and the downsampled sample target image into a second type discriminator module in the model to be trained;

and the first model determining unit is configured to determine the model to be trained as the trained image generation model when a first loss value corresponding to the image generator module, a second loss value corresponding to the first type of discriminator module and a third loss value corresponding to the second type of discriminator module meet a first preset condition.

Optionally, the model to be trained further includes a label classifier module; the device further comprises:

the feature input unit is configured to input target feature information into the label classifier module after the image generator module of the model to be trained extracts the target feature information of the sample image, so as to obtain a judgment result, wherein the judgment result is whether the target feature information comprises the first feature information;

a second model determination unit configured to perform, if the target feature information does not include the first feature information as a result of the determination, determining the model to be trained as a trained image generation model; and if the judgment result is that the target characteristic information comprises the first characteristic information, continuing to train the model to be trained.

Optionally, the apparatus further comprises:

a loss value calculation unit configured to perform calculation of a target structure similarity loss value of the sample image and the sample target image after generating a sample target image by the fusion processing of the target feature information and the first feature information of the target object;

a third model determining unit configured to determine the model to be trained as a trained image generation model when the first loss value, the second loss value, the third loss value, and the target structure similarity loss value satisfy a second preset condition; otherwise, continuing to train the model to be trained.

According to a fourth aspect of an embodiment of the present disclosure, an image generation apparatus includes:

an image acquisition unit configured to perform acquisition of an image to be processed including a target object having second feature information: first age characteristic information, first gender characteristic information or first color characteristic information of the target object;

a third image input unit, configured to perform input of the image to be processed into the trained image generation model according to the first aspect, so as to obtain a target image with third feature information, where the third feature information is second age feature information, second gender feature information, or second color feature information of the target object, and the third feature information is the same as the second feature information in type.

Optionally, the apparatus further comprises:

an image segmentation unit, configured to perform image segmentation preprocessing on the image to be processed before inputting the image to be processed into the trained model of the first aspect, so as to obtain a target object of the image to be processed;

the third image input unit configured to perform:

According to a fifth aspect of embodiments of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the method of training an image generation model according to the first aspect or the method of image generation according to the second aspect.

According to a sixth aspect of embodiments of the present disclosure, there is provided a storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the training method of the image generation model of the first aspect or the image generation method of the second aspect.

According to a seventh aspect of embodiments of the present disclosure, there is provided a computer program product containing instructions which, when run on a computer, cause the computer to implement the method for training an image generation model according to the first aspect, or the method for generating an image according to the second aspect.

According to the technical scheme provided by the embodiment of the disclosure, when the model is generated by training the image, the sample image containing the target object is input into the model to be trained; an image generator module of the model to be trained extracts target characteristic information of the sample image, and performs fusion processing on the target characteristic information and first characteristic information of a target object to generate a sample target image; respectively carrying out down-sampling on the sample image and the sample target image to obtain a sample image after down-sampling and a sample target image after down-sampling; the method comprises the steps that a sample image and a sample target image are input into a first type discriminator module of a model to be trained, and a down-sampled sample image and a down-sampled sample target image are correspondingly input into a second type discriminator module of the model to be trained; and when the first loss value corresponding to the image generator module, the second loss value corresponding to the first type of discriminator module and the third loss value corresponding to the second type of discriminator module meet a first preset condition, determining the model to be trained as the trained image generation model.

As can be seen, the model to be trained includes an image generator module, and a plurality of discriminator modules. After the image generator module generates the sample target image, the plurality of discriminator modules can supervise a plurality of target images with different resolutions and the sample target image, that is, the plurality of discriminator modules can supervise global structure information and local detail information of the sample target image, when a first loss value corresponding to the image generator module, a second loss value corresponding to the first type of discriminator module and a third loss value corresponding to the second type of discriminator module meet a first preset condition, a model to be trained is determined as a trained image generation model, so that the generated sample target image has the global structure information and the local detail information of the target image, and the obtained sample target image has higher consistency with the sample image, that is, the sample target image better restores the target image; and the obtained sample target image has the local detail information of the target image, so that the definition of the sample target image is higher.

Drawings

FIG. 1 is a flow diagram illustrating a method of training an image generation model in accordance with an exemplary embodiment;

FIG. 2 is a flow diagram illustrating an image generation method according to an exemplary embodiment;

FIG. 3 is a block diagram illustrating an apparatus for training an image generation model in accordance with an exemplary embodiment;

FIG. 4 is a flow diagram illustrating an image generation apparatus according to an exemplary embodiment;

FIG. 5 is a block diagram illustrating an electronic device in accordance with an exemplary embodiment;

FIG. 6 is a block diagram illustrating another electronic device in accordance with an exemplary embodiment;

FIG. 7 is a block diagram illustrating yet another electronic device in accordance with an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

In order to solve the technical problem that after an image is processed in the related art, the definition of the obtained image is low, the embodiments of the present disclosure provide a training method of an image generation model, an image generation method and an image generation device.

In a first aspect, a method for training an image generation model provided by an embodiment of the present disclosure is first explained in detail.

An execution subject of the training method for the image generation model provided by the embodiment of the present disclosure may be a training apparatus for the image generation model, and the apparatus may be run in an electronic device, and the electronic device may be a server or a terminal.

For clarity of description of the scheme, the structure of the model to be trained is explained first.

Specifically, the model to be trained may include an image generator module for generating an image, that is, for generating a sample target image, and a plurality of discriminator modules, each discriminator module being configured to discriminate authenticity of a sample target image of a different resolution with respect to a sample image of a corresponding resolution.

The model to be trained may be composed of a Variable Automatic Encoder (VAE) and a global genetic adaptive Network (cGAN) conditional Generative confrontation neural Network (cGAN).

FIG. 1 is a flow diagram illustrating a method of training an image generation model according to an exemplary embodiment.

As shown in fig. 1, the training method of the image generation model may include the following steps.

In step S11, a sample image containing a target object is input to the model to be trained.

It can be understood that, when training the model to be trained, a large number of sample images are needed, and it is reasonable that the sample images may be images taken by the electronic device serving as the execution subject or locally stored images, or may be images acquired by the electronic device serving as the execution subject from other electronic devices.

Also, in practical applications, the target object contained in different sample images may be different.

Specifically, the target objects are classified according to the age, and the target objects in the sample image may be adults or dolls.

The target objects are classified according to the gender, and the target objects in the sample images can be male or female.

The target objects are classified according to the color angle, and the target objects in the sample image can be color target objects or black and white target objects.

In step S12, the image generator module of the model to be trained extracts the target feature information of the sample image, and performs a fusion process on the target feature information and the first feature information of the target object to generate a sample target image.

Wherein, the target characteristic information is: characteristic information other than the first characteristic information, the first characteristic information being: feature information for characterizing an age feature, a gender feature, or a color feature of the target subject.

Specifically, after the model to be trained receives the sample image, the image generator module of the model to be trained may extract target feature information of the sample image, where the target feature information is feature information other than the first feature information. The first feature information may be feature information for characterizing an age feature, a gender feature, or a color feature of the target object.

For example, when the target object included in the sample image is an adult, the image generator of the model to be trained may extract feature information other than age feature information in the sample image, and use the extracted feature information other than age information as the target feature information.

When the target object included in the sample image is a male, the image generator of the model to be trained may extract feature information other than sex feature information in the sample image, and use the extracted feature information other than sex feature information as target feature information.

When the target object included in the sample image is a chromatic target object, the image generator of the model to be trained may extract feature information other than color feature information in the sample image, and use the extracted feature information other than color feature information as the target feature information.

After the image generator module of the model to be trained extracts the target characteristic information, in order to enable the trained image generation model to generate a result meeting the requirements of a user by increasing the control conditions in the subsequent steps. The target feature information and the first feature information of the target object may be subjected to fusion processing to generate a sample target image.

For example, the target object included in the sample image is an adult, and the target feature information extracted by the image generator of the model to be trained may not include the age feature information of the adult. In the case where the target feature information does not include the age feature information, a sample target image matching the added age feature information can be generated from the target feature information and the added age feature information. And an image generation module of the model to be trained performs fusion processing on the target characteristic information and the first characteristic information of the target object to generate a sample target image.

In step S13, the sample image and the sample target image are down-sampled to obtain a down-sampled sample image and a down-sampled sample target image, respectively.

After the sample target image is generated, the sample image and the sample target image may be downsampled. For example, the sample image has a resolution of 256x256, and the sample image may be downsampled to obtain a sample image with a resolution of 128x128 and a sample image with a resolution of 64x 64. Similarly, the resolution of the sample target image is 256x256, and the sample target image may be downsampled to obtain a sample target image with a resolution of 128x128 and a sample target image with a resolution of 64x 64. Of course, the down-sampling multiple is not specifically limited in the embodiment of the present invention.

In step S14, the sample image and the sample target image are input into the first type of discriminator module of the model to be trained, and the down-sampled sample image and the down-sampled sample target image are input into the second type of discriminator module of the model to be trained correspondingly.

The model to be trained comprises a plurality of discriminator modules, the sample image and the sample target image, and the down-sampled sample image and the down-sampled sample target image can be input into different discriminator modules, for clarity of the scheme description, the discriminator module into which the sample image and the sample target image are input can be called a first type of discriminator module, and the number of the first type of discriminator module is usually 1. And the discriminator modules to which the down-sampled sample image and the down-sampled sample target image are input may be referred to as second type discriminator modules, where the number of the second type discriminator modules may be determined according to the actual situation, for example, if the sample image and the sample target image are down-sampled twice, the number of the second type discriminator modules may be 2. The number of the second type of discriminator modules is not particularly limited in the embodiments of the present disclosure.

In step S15, when the first loss value corresponding to the image generator module, the second loss value corresponding to the first type of discriminator module, and the third loss value corresponding to the second type of discriminator module satisfy a first preset condition, the model to be trained is determined as the trained image generation model.

In this step, a first loss value corresponding to the image generator module, a second loss value corresponding to the first type of discriminator module, and a third loss value corresponding to the second type of discriminator module may be calculated. And when the first loss value, the second loss value and the third loss value meet the first preset condition, the distance between the sample image and the sample target image is small, that is, the sample image and the sample target image are close enough. Moreover, the sample target image contains the global structure information and the local detail information of the sample image, namely the definition of the sample target image is high, and at the moment, the model to be trained can be determined as the trained image generation model. Otherwise, the image generator model and each discriminator module need to be alternately trained repeatedly until the trained image generator model is generated.

In an alternative embodiment, the first preset condition is: the sum of the first loss value, the second loss value and the third loss value is less than a first preset loss value. Moreover, the first preset loss value can be determined according to actual conditions, and the size of the first preset loss value is not specifically limited in the present disclosure.

Therefore, the trained image generation model can generate a sample target image matched with the attached first characteristic information according to the target characteristic information and the attached first characteristic information under the condition that the target characteristic information does not include the first characteristic information. The first characteristic information may be age characteristic information, gender characteristic information, or color characteristic information.

In order to make the target feature vector extracted by the image generator module completely contain no first feature information of the target object, on the basis of the embodiment shown in fig. 1, in an optional implementation manner, the model to be trained may further include a label classifier module;

after the image generator module of the model to be trained extracts the target feature information of the sample image, the training method of the image generation model may further include the following steps, step a1 and step a 2:

step A1, inputting the target characteristic information into the label classifier module to obtain a judgment result, wherein the judgment result is whether the target characteristic information includes the first characteristic information.

In this step, after the image generator module of the model to be trained extracts the target feature information of the sample image, the target feature information may be input into the tag classifier module. The input of the tag classifier module is target feature information, and the output result of the tag classifier module may be: the target feature information includes the first feature information, or the target feature information does not include the first feature information.

For example, for an application scene in which the target object included in the sample image is an adult, the age information of the target object is an adult, and the age information of the adult is not included in the target feature information. Inputting the target feature information into a tag classifier module, wherein the output result of the tag classifier module can be used for: it is determined whether the target feature information is generated from an adult image. If the output result of the tag classifier module can judge that the target feature information is generated by an adult, it indicates that the target feature information contains age information, at this time, the image generator module, the plurality of discriminator modules and the tag classifier module need to be trained alternately, so that the age information in the target feature information is continuously removed, so that the generated target image has a better effect when the image is processed by using the trained image generation model, for example, an adult image is completely changed into a doll face image.

Step A2, if the judgment result is that the target characteristic information does not include the first characteristic information, determining the model to be trained as a trained image generation model; and if the judgment result is that the target characteristic information comprises the first characteristic information, continuing training the model to be trained.

Specifically, if the determination result is that the target feature information does not include the first feature information, it indicates that the target feature information is more thoroughly removed from the first feature information, so that, in the subsequent step, when the trained image generation model is used to generate the target image, interference to the target image due to the first feature information remaining in the target feature information can be reduced, and therefore, the model to be trained can be determined as the trained image generation model at this time. On the contrary, if the determination result is that the target feature information includes the first feature information, it indicates that the removal of the first feature information by the target feature information is not thorough, and in order to prevent interference to the target image due to the first feature information remaining in the target feature information when the trained image generation model is used to generate the target image in the subsequent step, it is necessary to continue training the model to be trained.

Therefore, according to the technical scheme of the embodiment of the disclosure, the first feature information of the target feature information is removed more thoroughly, so that in the subsequent steps, when the trained image generation model is used for generating the target image, the interference on the target image caused by the first feature information remained in the target feature information can be reduced, and the image processing result is better.

In addition, in order to make the structure of the sample target image similar to that of the sample image, after the target feature information is fused with the first feature information of the target object to generate the sample target image based on the embodiment shown in fig. 1 or fig. 2, the training method for the image generation model may further include the following two steps, namely step B1 and step B2:

and step B1, calculating the target structure similarity loss value of the sample image and the sample target image.

Step B2, when the first loss value, the second loss value, the third loss value and the target structure similarity loss value meet a second preset condition, determining the model to be trained as a trained image generation model; otherwise, continuing to train the model to be trained.

In this embodiment, a structural similarity loss function may be set in the model to be trained, a target structural similarity loss value between the sample image and the second sample target image is calculated by using the structural similarity loss function, and when the first loss value, the second loss value, the third loss value, and the target structural similarity loss value satisfy the second preset condition, it is described that the structural similarity between the sample image and the sample target image is higher, so that the model to be trained may be determined as the trained image generation model. Otherwise, the model to be trained needs to be trained again until the trained image generation model is obtained. The target structure Similarity loss value can be measured by various methods, for example, by SSIM (Structural Similarity Index).

In an alternative embodiment, the second preset condition is: the sum of the first loss value, the second loss value, the third loss value and the target structure similarity loss value is less than a second preset loss value. The present disclosure does not specifically limit the magnitude of the second preset loss value.

Therefore, according to the technical scheme of the embodiment of the disclosure, the loss value of the structural similarity between the sample image and the sample target image is small, so that when the trained image generation model is used for generating the target image in the subsequent steps, the structures of the sample image and the sample target image are closer, and the generated sample target image is more real.

For clarity of description of the scheme, the trained image generation module will be described below by using a specific example.

For example, the sample image is: x, x has a resolution of 256x256, and the sample target image generated by the image production module is: x'.

Before (x, x') is sent to the discriminator module, the sample image and the sample target image are respectively downsampled by 2 times and 4 times, and the resolution of the downsampled image is respectively as follows: 128x128 and 64x 64. Finally, three resolution images (x, x '), (x _128, x ' _128), and (x _64, x ' _64) are obtained. Wherein x _128 and x _64 are images obtained by down-sampling x; x ' _128 and x ' _64 are images obtained by down-sampling x '.

The model to be trained is designed with discriminator modules for 3 images with different resolutions, that is, the model to be trained includes 3 discriminator modules, D1, D2 and D3. The resolution is determined for each of the 3 images with different resolutions.

Next, Loss values Loss of D1, D2, and D3 for the generated 3-resolution pictures (x ', x ' _128, x ' _64) are calculated as Loss _ D1, Loss _ D2, and Loss _ D3, respectively. The total Loss value Loss _ GAN corresponding to the 3 discriminator modules is Loss _ D1+ Loss _ D2+ Loss _ D3. That is, in the technical solution provided by the embodiment of the present disclosure, a loss supervised image generator model of multiple scales is used to generate a global structure and a local detail at the same time.

Moreover, in order to solve the problem of incomplete adult doll faces in the related art, the module to be trained may further include a Label Classifier (Label Classifier) module, which is specifically as follows:

1. assuming that the real age characteristic information of x is x _ label, a label classifier (LabelClassifier) is designed, the classifier label classifier takes target characteristic information z (which does not include x _ label) as input, outputs the result for distinguishing whether z is generated by an adult image or a doll face image, and constantly optimizes the capability of the label classifier by taking x _ label as a supervision signal.

2. And the Loss value Loss _ Classifier of the label Classifier is used as the negative Loss of the model to be trained, so that the model to be trained needs to generate age characteristic information contained in z as much as possible, and the reverse Loss _ Classifier of the Classifier is increased.

3. When the model to be trained is trained alternately during each training, the age characteristic information in z is continuously removed, and therefore in the subsequent steps, the adult doll face can be changed more thoroughly.

Also, to solve the problem that the adult doll face is not like oneself after, which exists in the related art, the model to be trained may include a structural similarity loss (SSIM loss) to limit the generated result x' to be structurally close to the input x. Thus, Loss _ SSIM is defined as: loss _ SSIM ═ SSIM (x', x)

In summary, the loss values of the model to be trained may include: loss _ GAN (antagonistic Loss), Loss _ recon (reductive Loss): the penalty value is used to characterize the distance between x' and x, Loss _ SSIM (structural similarity penalty), and Loss _ Classifier. When the model to be trained is trained, if the sum of the four loss values is less than a third preset loss value, the model to be trained can be determined as a trained image generation model. The third preset loss value may be set according to an actual situation, which is not specifically limited in the embodiment of the present disclosure.

In a second aspect, an image generation method provided by the embodiment of the present disclosure is explained in detail.

An execution subject of the image generation method provided by the embodiment of the present disclosure may be an image generation apparatus, where the apparatus may be run in an electronic device, and the electronic device may be a server or a terminal.

As shown in fig. 2, an embodiment of the present disclosure provides an image generation method, including the following steps.

In step S21, an image to be processed including the target object is acquired.

The target object has second characteristic information, and the second characteristic information is: first age characteristic information, first gender characteristic information, or first color characteristic information of the target object.

The electronic apparatus as the execution subject may acquire the image to be processed in various ways, for example, the electronic apparatus as the execution subject currently takes one image, and at this time, the electronic apparatus as the execution subject may take the one image as the image to be processed. Alternatively, the electronic apparatus as the execution subject may take any one of the images that have been taken as the image to be processed. Alternatively, the electronic device as the execution subject may also take an image acquired from another electronic device as the execution subject as the image to be processed. This is all reasonable.

In practical applications, the target objects included in the image to be processed are usually different for different application scenes.

For example, in one application scenario, a user wants to change an adult image into a doll image, and at this time, the target object included in the image to be processed is an adult. In another application scenario, the user wants to change the doll image into an adult image, and the target object included in the image to be processed is the doll.

For another example, in another application scenario, the user wants to change the image of a male into an image of a female, and at this time, the target object included in the image to be processed is an adult male. In another application scenario, a user wants to change a female image into a male image, and at this time, the target object included in the image to be processed is a female.

For another example, in another application scenario, the user wants to change a color image into a black-and-white image, and the target object included in the image to be processed is a color target object. In another application scenario, a user wants to change a black-and-white image into a color image, and at this time, the target object included in the image to be processed is a black-and-white target object.

As can be seen from the above description, the target object has the second characteristic information. The second characteristic information may be first age characteristic information of the target object, the first age characteristic information being used to characterize an age characteristic of the target object. The second characteristic information may also be first gender characteristic information of the target object, the first gender characteristic information being used for characterizing gender characteristics of the target object. The second feature information may also be first color feature information of the target object, the first color feature information being used to characterize a color feature of the target object.

In step S22, the image to be processed is input into the trained image generation model in the first embodiment, and a target image with third feature information is obtained.

The third characteristic information is second age characteristic information, second gender characteristic information or second color characteristic information of the target object, and the type of the third characteristic information is the same as that of the second characteristic information.

After the image to be processed is acquired, the image to be processed may be input into the trained image generation model in the embodiment of the first aspect, and after the image generation model receives the image to be processed, the target feature information of the image to be processed, excluding the second feature information, may be extracted, and the extracted target feature information and the third feature information are fused, so as to obtain the target image with the third feature information. And, the third characteristic information is the same type as the second characteristic information, that is, when the second characteristic information is the first age characteristic information, when the third characteristic information is the second age characteristic information. And when the second characteristic information is the first gender characteristic information, the third characteristic information is the second gender characteristic information. And when the second characteristic information is the first color characteristic information, the third characteristic information is the second color characteristic information.

Specifically, in practical applications, the electronic device as the execution subject may receive an image processing instruction for processing the image to be processed, which is input by the user, and the image processing instruction may instruct the electronic device to generate the target image having which kind of the third feature information using the image generation model.

For example, when the user wants to see an adult image corresponding to the doll image, the user may click a button on the terminal for changing the doll into an adult, so that the terminal may receive an image processing instruction input by the user, at this time, the electronic device serving as the execution subject may input the image to be processed into a trained image generation model, extract target feature information of the image to be processed, excluding the age feature information of the doll, and fuse the extracted target feature information with the age feature information of the adult, so as to generate a target image with the age feature information of the adult, that is, generate an adult image, thereby realizing changing the doll image into an adult image, that is, predicting a model of the doll in the doll image after the doll becomes old.

Of course, the foregoing is merely an illustration of how a doll-forming person may be implemented using the teachings provided by embodiments of the present disclosure. Of course, the technical solution provided by the embodiment of the present disclosure can also realize that an adult becomes a doll, a male becomes a female, a female becomes a male, a color image becomes a black and white image, and a black and white image becomes a color image. And will not be described in detail herein.

Moreover, when the second characteristic information is that the glasses are not worn, the effect of changing from the glasses-free state to the glasses-worn state can be achieved, that is, the embodiment of the present disclosure does not specifically limit the second characteristic information, which may be any characteristic information for characterizing the target object.

Moreover, when the image generation model is trained, the model to be trained comprises an image generator module, a plurality of discriminator modules, a label classifier module and a structural similarity loss function, so that the definition of the generated target image is higher; good results, such as more thorough adult dolls; and more realistic, e.g. the structure of the target image and the image to be processed is more similar.

According to the technical scheme provided by the embodiment of the disclosure, the electronic device acquires the image to be processed including the target object, wherein the target object has second characteristic information, and the second characteristic information is as follows: first age characteristic information, first gender characteristic information, or first color characteristic information of the target object. And inputting the image to be processed into the trained image generation model in the embodiment of the first aspect to obtain a target image with third feature information, wherein the third feature information is second age feature information, second gender feature information or second color feature information of the target object, and the type of the third feature information is the same as that of the second feature information. In the process of training the image generation model, the plurality of discriminator modules can supervise the global structure information and the local detail information of the sample target image, so that the generated target image can better restore other feature information except the second feature information in the image to be processed; and the obtained target image has local detail information of the image to be processed, so that the definition of the target image is higher.

In addition, in order to reduce the influence of the background area of the image to be processed on the target image, on the basis of the embodiment shown in fig. 2, before the image to be processed is input into the trained model in the embodiment of the first aspect, the image generation method may further include:

and carrying out image segmentation pretreatment on the image to be processed to obtain a target object of the image to be processed.

Correspondingly, inputting the image to be processed into the trained model according to the embodiment of the first aspect includes:

inputting the target object into the trained model of the embodiment of the first aspect, and inputting the image to be processed into the trained model of the embodiment of the first aspect.

In order to prevent the background region of the image to be processed from affecting the definition of the target image in the subsequent steps, image segmentation preprocessing may be performed on the image to be processed to obtain a target object of the image to be processed.

For example, in an application scenario of changing an adult image into a doll image, since the adult image includes a background area, for example, the background area includes a desk, a chair, a wall pendant, or the like, if the desk, the chair, the wall pendant, or the like is input into a pre-trained image generation model in a subsequent step, the definition of the target image will be affected. Therefore, head segmentation can be performed on the adult image, thereby removing the interference of the background region. Therefore, only the target object of the image to be processed can be input into the image generation model trained in advance, and the interference of the background area can be removed. For example, only the head region may be input into the pre-trained image generation model, and the pre-trained image generation model may focus on the transformation of the head region.

According to a third aspect of the embodiments of the present disclosure, there is provided a training apparatus for an image generation model, as shown in fig. 3, including:

a first image input unit 310 configured to perform input of a sample image containing a target object into a model to be trained;

the triggering unit 320 is configured to trigger the image generator module of the model to be trained to extract the target feature information of the sample image, and perform fusion processing on the target feature information and the first feature information of the target object to generate a sample target image; the target characteristic information is as follows: characteristic information other than the first characteristic information, the first characteristic information being: feature information for characterizing an age feature, a gender feature, or a color feature of the target subject;

a down-sampling unit 330 configured to perform down-sampling the sample image and the sample target image respectively to obtain a down-sampled sample image and a down-sampled sample target image;

a second image input unit 340 configured to execute a first type of discriminator module that inputs the sample image and the sample target image into the model to be trained, and correspondingly input the downsampled sample image and the downsampled sample target image into a second type of discriminator module in the model to be trained;

a first model determining unit 350, configured to determine the model to be trained as the trained image generation model when a first loss value corresponding to the image generator module, a second loss value corresponding to the first type of discriminator module, and a third loss value corresponding to the second type of discriminator module satisfy a first preset condition.

In an alternative embodiment, the first preset condition is: the sum of the first loss value, the second loss value and the third loss value is less than a first preset loss value.

Optionally, the apparatus further comprises:

In an alternative embodiment, the second preset condition is: the sum of the first loss value, the second loss value, the third loss value and the target structure similarity loss value is less than a second preset loss value.

According to a fourth aspect of the embodiments of the present disclosure, there is provided an image generation apparatus, as shown in fig. 4, including:

an image obtaining unit 410 configured to perform obtaining of an image to be processed including a target object having second feature information: first age characteristic information, first gender characteristic information or first color characteristic information of the target object;

a third image input unit 420, configured to perform inputting the image to be processed into the trained image generation model according to the third aspect, so as to obtain a target image with third feature information, where the third feature information is second age feature information, second gender feature information, or second color feature information of the target object, and the third feature information is the same as the second feature information in type.

According to the technical scheme provided by the embodiment of the disclosure, the electronic device acquires the image to be processed including the target object, wherein the target object has second characteristic information, and the second characteristic information is as follows: first age characteristic information, first gender characteristic information, or first color characteristic information of the target object. And inputting the image to be processed into the trained image generation model in the embodiment of the first aspect to obtain a target image with third feature information, wherein the third feature information is second age feature information, second gender feature information or second color feature information of the target object, and the type of the second feature information is the same as that of the first feature information. In the process of training the image generation model, the plurality of discriminator modules can supervise the global structure information and the local detail information of the sample target image, so that the generated target image has the global structure information and the local detail information of the image to be processed, and the generated target image can better restore other feature information except the second feature information in the image to be processed; and the obtained target image has local detail information of the image to be processed, so that the definition of the target image is higher.

Optionally, the apparatus further comprises:

an image segmentation unit, configured to perform image segmentation preprocessing on the image to be processed before inputting the image to be processed into the trained model of the third aspect, so as to obtain a target object of the image to be processed;

the third image input unit configured to perform:

inputting the target object into the trained image generation model of the third aspect.

It can be seen that the interference of the background area is removed by performing head segmentation on the adult image. Therefore, only the target object of the image to be processed can be input into the image generation model trained in advance, and the interference of the background area can be removed.

FIG. 5 is a block diagram of an electronic device shown in accordance with an example embodiment. Referring to fig. 5, the terminal includes:

a processor 510;

a memory 50 for storing the processor-executable instructions;

Fig. 6 is a block diagram illustrating another electronic device 600 according to an example embodiment. For example, the apparatus 600 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 6, apparatus 600 may include one or more of the following components: a processing component 602, a memory 604, a power component 606, a multimedia component 608, an audio component 610, an interface to input/output (I/O) 612, a sensor component 614, and a communication component 616.

The processing component 602 generally controls overall operation of the device 600, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 602 may include one or more processors 620 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 602 can include one or more modules that facilitate interaction between the processing component 602 and other components. For example, the processing component 602 can include a multimedia module to facilitate interaction between the multimedia component 608 and the processing component 602.

The memory 604 is configured to store various types of data to support operation at the device 600. Examples of such data include instructions for any application or method operating on device 600, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 604 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power supply component 606 provides power to the various components of device 600. The power components 606 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 600.

The multimedia component 608 includes a screen that provides an output interface between the device 600 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 608 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 600 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 610 is configured to output and/or input audio signals. For example, audio component 610 includes a Microphone (MIC) configured to receive external audio signals when apparatus 600 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 604 or transmitted via the communication component 616. In some embodiments, audio component 610 further includes a speaker for outputting audio signals.

The I/O interface 612 provides an interface between the processing component 602 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 614 includes one or more sensors for providing status assessment of various aspects of the apparatus 600. For example, the sensor component 614 may detect an open/closed state of the device 600, the relative positioning of components, such as a display and keypad of the apparatus 600, the sensor component 614 may also detect a change in position of the apparatus 600 or a component of the apparatus 600, the presence or absence of user contact with the apparatus 600, orientation or acceleration/deceleration of the apparatus 600, and a change in temperature of the apparatus 600. The sensor assembly 614 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 614 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 614 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 616 is configured to facilitate communications between the apparatus 600 and other devices in a wired or wireless manner. The apparatus 600 may access a wireless network based on a communication standard, such as WiFi, an operator network (such as 2G, 3G, 6G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 616 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 616 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 600 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the training method of the image generation model according to the first aspect, or the image generation method according to the second aspect.

In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 606 comprising instructions, executable by the processor 620 of the apparatus 600 to perform the above-described method is also provided. Alternatively, for example, the storage medium may be a non-transitory computer-readable storage medium, such as a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Fig. 7 is a block diagram illustrating an electronic device portion 700 according to an example embodiment. For example, the apparatus 700 may be provided as a server. Referring to fig. 7, apparatus 700 includes a processing component 722 that further includes one or more processors and memory resources, represented by memory 732, for storing instructions, such as applications, that are executable by processing component 722. The application programs stored in memory 732 may include one or more modules that each correspond to a set of instructions. Further, the processing component 722 is configured to execute instructions to perform the method of training an image generation model according to the first aspect, or the method of image generation according to the second aspect.

The apparatus 700 may also include a power component 726 configured to perform power management of the apparatus 700, a wired or wireless network interface 750 configured to connect the apparatus 700 to a network, and an input output (I/O) interface 758. The apparatus 700 may operate based on an operating system stored in memory 732, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

In yet another aspect of the present disclosure, the present disclosure also provides a storage medium, and when executed by a processor of an electronic device, the instructions in the storage medium enable the electronic device to perform the training method of the image generation model according to the first aspect, or the image generation method according to the second aspect.

According to yet another aspect of embodiments of the present disclosure, there is provided a computer program product containing instructions which, when run on a computer, cause the computer to implement the method of training an image generation model according to the first aspect or the method of image generation according to the second aspect.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A training method of an image generation model is characterized by comprising the following steps:

inputting a sample image containing a target object into a model to be trained;

2. The method of claim 1, wherein the model to be trained further comprises a label classifier module;

3. The method according to claim 1 or 2, wherein after the fusion processing of the target feature information and the first feature information of the target object to generate a sample target image, the method further comprises:

4. The method according to claim 1 or 2, characterized in that the first preset condition is: the sum of the first loss value, the second loss value and the third loss value is less than a first preset loss value.

5. The method according to claim 3, characterized in that the second preset condition is: the sum of the first loss value, the second loss value, the third loss value and the target structure similarity loss value is less than a second preset loss value.

6. An image generation method, comprising:

inputting the image to be processed into the trained image generation model according to any one of claims 1 to 5, to obtain a target image having the third feature information, where the third feature information is second age feature information, second gender feature information, or second color feature information of the target object, and the type of the third feature information is the same as that of the second feature information.

7. An apparatus for training an image generation model, comprising:

8. An image generation apparatus, comprising:

a third image input unit configured to perform inputting the image to be processed into the trained image generation model according to any one of claims 1 to 5, so as to obtain a target image having third feature information, where the third feature information is second age feature information, second gender feature information, or second color feature information of the target object, and the third feature information is the same as the second feature information in type.

9. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the method of training an image generation model according to any one of claims 1 to 5 or the method of image generation according to claim 6.

10. A storage medium in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform a training method of an image generation model according to any one of claims 1 to 5, or an image generation method according to claim 6.