CN111340745B

CN111340745B - Image generation method and device, storage medium and electronic equipment

Info

Publication number: CN111340745B
Application number: CN202010227209.7A
Authority: CN
Inventors: 袁霖; 田野; 何世伟
Original assignee: Chengdu Anyixun Technology Co ltd
Current assignee: Chengdu Anyixun Technology Co ltd
Priority date: 2020-03-27
Filing date: 2020-03-27
Publication date: 2021-01-05
Anticipated expiration: 2040-03-27
Also published as: CN111340745A

Abstract

The application provides an image generation method, an image generation device, a storage medium and an electronic device, wherein the method comprises the following steps: acquiring a first image, a second image, first mask data and second mask data, wherein each mask value in the first mask data corresponds to a category to which a corresponding position on the first image belongs, and each mask value in the second mask data corresponds to a color of the corresponding position on the second image; iteratively executing a process of inputting the first image, the second image, the first mask data, the second mask data and the seed image into the image generation model for processing, calculating loss based on the seed image, and updating a weight parameter in the image generation model based on the loss; and after the iteration is finished, selecting one seed image from the multiple seed images obtained by multiple iterations according to the obtained loss, and taking the seed image as a final target image. According to the embodiment of the application, the color of the second image is used as the optimization constraint, so that the image synthesis can be better realized, and a high-quality target image can be obtained.

Description

Image generation method and device, storage medium and electronic equipment

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image generation method, an image generation apparatus, a storage medium, and an electronic device.

Background

In the image processing, there is a demand scene of image synthesis, namely, a new image is generated according to a certain algorithm according to the existing image, and the technology can be widely applied to the fields of image quality improvement, image beautification, automatic image file design, image style migration and the like. In an existing image generation algorithm based on a generation-oriented countermeasure network (GAN), image generation processing is mainly performed according to a global distribution modification and adjustment mode, an image generated by the processing algorithm has the problem of unstable local features, details have a high-probability distortion condition after image synthesis, and for example, a condition that regular picture elements such as buildings are locally distorted may occur.

Disclosure of Invention

An object of the embodiments of the present application is to provide an image generation method, an image generation apparatus, a storage medium, and an electronic device, so as to solve the above problems in the prior art.

In a first aspect, an embodiment of the present application provides an image generation method, where the method includes: acquiring a first image, a second image, first mask data and second mask data, wherein each mask value in the first mask data corresponds to a category to which a corresponding position on the first image belongs, and each mask value in the second mask data corresponds to a color of the corresponding position on the second image; inputting the first image, the second image, the first mask data, the second mask data and the original seed image into an image generation model for processing to obtain a processed new seed image; calculating corresponding loss based on the new seed image, and updating a weight parameter in the image generation model based on the loss; iteratively executing a process of inputting the first image, the second image, the first mask data, the second mask data and a new seed image obtained by last processing into an image generation model for processing, calculating corresponding loss based on the new seed image obtained by current iteration, and updating a weight parameter in the image generation model based on the loss; and after the iteration is finished, selecting a new seed image from a plurality of new seed images obtained by multiple iterations according to the obtained loss, and taking the new seed image as a final target image.

According to the technical scheme, the texture of the first image and the style of the second image are fused by utilizing the original seed image, the seed image is iterated continuously in a gradient descending mode, a loss value for measuring the similarity between the seed image and an expected target is reduced continuously in the iteration process, and the purpose of image synthesis is finally achieved. In addition, in the whole process, the details of texture in the seed image are not changed, and the color of the corresponding area is changed based on the mask, so that the problem of entity object distortion in the GAN series image generation algorithm can be effectively solved.

In an optional implementation manner, the second mask data is obtained by preprocessing a second image by using a first image segmentation model, and the first image segmentation model is obtained by training an example segmentation network; the method further comprises the following steps: acquiring a first training sample set for training, wherein each training sample in the first training sample set is subjected to mask value labeling according to a preset color mapping table, and areas, located in the same color interval set in the color mapping table, of the training samples in the first training sample set are labeled with the same mask value; and training an example segmentation network by using the first training sample set, and obtaining the first image segmentation model after the training is finished.

The method and the device utilize the color information carried in the second mask data as constraints to continuously optimize the seed image. Therefore, before the first image segmentation model is trained, the mask value labeling needs to be performed on the image according to a preset color mapping table to obtain a training sample.

In an optional implementation manner, the first mask data is obtained by preprocessing a first image by using a second image segmentation model, and the second image segmentation model is obtained by training an example segmentation network; the method further comprises the following steps: acquiring a second training sample set for training, wherein each training sample in the second training sample set is subjected to mask value labeling, and regions belonging to the same class on the training samples in the second training sample set are labeled with the same mask value; and training the example segmentation network by using the second training sample set, and obtaining the second image segmentation model after the training is finished.

In an optional embodiment, before inputting the first image, the second image, the first mask data, the second mask data, and the raw seed image into an image generation model for processing, the method further comprises: setting the sizes of the first image, the second image, the first mask data, the second mask data and the original seed image to be the sizes required by the input of the image generation model; the inputting the first image, the second image, the first mask data, the second mask data and the original seed image into an image generation model for processing includes: inputting the first image, the second image, the first mask data, the second mask data and the original seed image with the set sizes into an image generation model for processing.

In an alternative embodiment, the selecting a new seed image from a plurality of new seed images obtained from a plurality of iterations according to the obtained loss as the final target image includes: and determining a new seed image with the lowest loss value from a plurality of new seed images obtained by a plurality of iterations as a final target image.

In an alternative embodiment, the calculating the corresponding loss based on the new seed image includes: respectively calculating a first loss, a second loss and a third loss based on the new seed image, and accumulating the first loss, the second loss and the third loss to obtain the loss corresponding to the new seed image; wherein the first loss is a loss of the new seed image in content based on the first image, the second loss is a loss of the new seed image in genre based on the second image, and the third loss is a loss of image quality based on the new seed image.

The first loss is a content loss characterizing the texture similarity of the new seed image to the first image, the second loss is a style loss characterizing the style similarity of the new seed image to the second image, and the third loss is a constraint loss for evaluating the image quality of the new seed image output at each iteration. The loss function constructed by the method combines the first loss, the second loss and the third loss, gradually iterates by using a gradient descent method with the loss function as a target, and gradually optimizes the seed image in the image generation process.

In an optional embodiment, before inputting the first image, the second image, the first mask data, the second mask data, and the raw seed image into an image generation model for processing, the method further comprises: and generating the original seed image by using a random number, wherein the original seed image is a noise image, a white image or a black image.

In a second aspect, an embodiment of the present application provides an image generating apparatus, including: the data acquisition module is used for acquiring a first image, a second image, first mask data and second mask data, wherein each mask value in the first mask data corresponds to a category to which a corresponding position on the first image belongs, and each mask value in the second mask data corresponds to a color of the corresponding position on the second image; the image synthesis module is used for inputting the first image, the second image, the first mask data, the second mask data and the original seed image into an image generation model for processing to obtain a processed new seed image; and for calculating a corresponding loss based on the new seed image and updating a weight parameter in the image generation model based on the loss; the process is used for inputting the first image, the second image, the first mask data, the second mask data and the new seed image obtained by the last processing into an image generation model for processing in an iteration mode, calculating corresponding loss based on the new seed image obtained by the current iteration, and updating the weight parameter in the image generation model based on the loss; and the target determining module is used for selecting a new seed image from a plurality of new seed images obtained by multiple iterations according to the obtained loss after the iteration is finished, and taking the new seed image as a final target image.

In a third aspect, an embodiment of the present application provides a storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method according to the first aspect is performed.

In a fourth aspect, an embodiment of the present application provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the method of the first aspect.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 is a flowchart of an image generation method provided in an embodiment of the present application;

FIG. 2 is a diagram of a new seed image output by the image generation model when the number of iterations is 100;

FIG. 3 is a diagram of a new seed image output by the image generation model for a number of iterations 3100;

fig. 4 is a schematic diagram of an image generating apparatus provided in an embodiment of the present application;

fig. 5 is a schematic view of an electronic device provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

An embodiment of the present application provides an image generation method, and fig. 1 shows a flowchart of the image generation method, and as shown in fig. 1, the method includes the following steps:

step 110: the first image and the second image, and the first mask data and the second mask data are acquired.

The first image is used as a content source image required in image synthesis, and the second image is used as a style source image required in image synthesis. The image generation task is, for example, to generate a new image according to the content in the first image and the style in the second image, that is, an image of the real object of the first image in the environment (such as attributes of illumination, color, and the like) of the second image. First, a first image and a second image are acquired, respectively, and then first mask data corresponding to the first image and second mask data corresponding to the second image are automatically generated based on the first image and the second image.

The first mask data comprises category information of a first image, the second mask data comprises color information of a second image, the first mask data and the second mask data are respectively composed of a plurality of mask values, and each mask value corresponds to one pixel point in the image. Each mask value in the first mask data corresponds to a class to which a corresponding position on the first image belongs, and each mask value in the second mask data corresponds to a color of a corresponding position on the second image.

The category to which the corresponding position on the first image belongs includes, for example, "person", "car", "horse", etc., for example, a certain region on the first image belongs to the category "person", and the corresponding mask value is the value corresponding to "person", assuming that the value corresponding to "person" is 1, the mask value of each pixel in the area is 1, for example, if a certain area on the first image belongs to the category of "horse", the corresponding mask value is the numerical value corresponding to "horse", assuming that the numerical value corresponding to "horse" is 2, the mask value of each pixel point in the area is 2, and so on, the mask value corresponding to the area belonging to the category of "car" is set to 3, for example, and the mask value corresponding to the area without semantics is set to 0, for example, so that the first mask data can be obtained according to the category of each instance on the first image. Of course, the categories in the first image are not limited to the above-mentioned categories in this embodiment, and in practical applications, more categories may be included.

Optionally, in step 110, the step of acquiring the first mask data includes: and preprocessing the first image by using the second image segmentation model to obtain first mask data. The second image segmentation model is obtained by training an example segmentation network, which may be MASK-RCNN (or other example segmentation networks). Before preprocessing the first image with the second image segmentation model, the method further comprises:

A. labeling the multiple images, and obtaining a second training sample set for training the instance segmentation network after the labeling is finished;

in the above labeling process, regions belonging to the same category on each image are labeled with the same mask value, for example, labeling is performed according to the above example of this embodiment, a region belonging to a person is labeled as 1, a region belonging to a horse is labeled as 2, a region belonging to a car is labeled as 3, and a region without semantic meaning is labeled as 0.

B. And training the example segmentation network by using a second training sample set, and obtaining a second image segmentation model after the training is finished.

The mask values in the second mask data correspond to colors on the second image. Specifically, a color mapping table is preset, and mask values corresponding to different color intervals are defined in the color mapping table. For example, assuming that a person exists on the second image, the main color of the region of the person is white (face), the hair region is black, and the clothing is red, assuming that the mask value corresponding to white is 1, the mask value corresponding to black is 2, and the mask value corresponding to red is 3, in the second mask data, the mask value corresponding to each pixel point in the white region on the second image is 1, the mask value corresponding to each pixel point in the black region is 2, and the mask value corresponding to each pixel point in the red region is 3. When the color mapping table is configured, the mask values of the pixels are configured according to the color interval, that is, similar colors in the same color interval have the same mask value, for example, the main color is green, and light green and dark green are both configured in the same color interval, so that the mask values of the pixels with the color of light green or dark green in the second image are the same.

Optionally, in step 110, the second MASK data is obtained by preprocessing the second image using a first image segmentation model, where the first image segmentation model is obtained by training an example segmentation network, and the example segmentation network may be, for example, MASK-RCNN (or other example segmentation networks). Before preprocessing the second image with the first image segmentation model, the method further comprises:

A. labeling the multiple images, and obtaining a first training sample set for training the instance segmentation network after the labeling is finished;

in the process, mask value labeling is carried out on a plurality of images according to a preset color mapping table, corresponding mask values are labeled to corresponding positions by utilizing a marking tool, and areas on each image, which belong to the same color interval set in the color mapping table, are labeled with the same mask value; after labeling is performed according to the above manner, each training sample in the first training sample set has completed mask value labeling, and each training sample includes a training image and a color label corresponding to the training image.

B. And training the example segmentation network by using the first training sample set, and obtaining a first image segmentation model after the training is finished.

The first image segmentation model and the second image segmentation model are two independent models, and are trained in the training process. The two image segmentation models can be obtained based on the same example segmentation network, and can also be obtained by respectively training different example segmentation networks. As can be seen from the above steps, in the embodiment of the present application, by using the first image segmentation model and the second image segmentation model, mask data of the first image and mask data of the second image can be automatically generated in an image generation process, and it is not necessary to separately mark mask values of the first image and the second image by a person, so that processing speed and efficiency are more efficient.

After step 110, execution continues with step 120: and inputting the first image, the second image, the first mask data, the second mask data and the original seed image into an image generation model for processing to obtain a processed new seed image.

In one embodiment, prior to step 120, the method further comprises: the original seed image is randomly generated. For example, a two-dimensional image is generated by using a random number, and the two-dimensional image is an original seed image, where the original seed image may be a random noise image, the noise may be gaussian noise, white noise, or color noise, and the like, and the original seed image may also be a white image or a black image, that is, R, G, B values of each pixel point in the original seed image are 255 or 0. The original seed image is an RGB three-channel image, the size of the original seed image is the same as the size required by the input of the image generation model, and if the size of the original seed image is different, the size of the original seed image needs to be processed and then input into the image generation model.

Optionally, in order to simplify the operation and reduce the calculation amount, the original seed image may also be preset in the storage space, and before performing step 120 each time, only the preset original seed image needs to be acquired from the storage space, and no separate generation is needed.

Step 130: corresponding losses are calculated based on the new seed image, and weight parameters in the image generation model are updated based on the losses.

After the first image, the second image, the first mask data, the second mask data and the original seed image are obtained, the data are input into an image generation model for processing, and a processed new seed image is obtained. And then, calculating corresponding loss based on the new seed images obtained by the current processing, wherein each new seed image corresponds to a loss value, and modifying the weight parameters in the image generation model based on the calculated loss. The image generation model adopts a neural network, and pre-training networks such as VGG and RESNET can be selected.

Next, the present embodiment adopts multiple iterations, and continuously optimizes the seed image in the process of each iteration to gradually approach the desired target. In the image generation process, the pixel values of the variant sub-images are continuously changed, the content and style of the variant sub-images are continuously changed every iteration, and finally one seed image is selected to be used as a final synthesized target image.

After the image generation model outputs a new processed seed image each iteration, the loss can be calculated according to a pre-constructed loss function. Optionally, the step of calculating the loss comprises: and respectively calculating a first loss, a second loss and a third loss based on the new seed image, and accumulating the first loss, the second loss and the third loss to obtain the loss corresponding to the new seed image. Wherein the first loss is a loss in content of the new seed image based on the first image, the second loss is a loss in genre of the new seed image based on the second image, and the third loss is a loss in image quality based on the new seed image.

In the process of generating the model of the picture, withdraw the characteristic from the picture input at first, then according to the characteristic withdrawn, reduce the picture corresponding to this kind of characteristic on the seed picture, including reducing the content in the first picture and style in the second picture. The first loss is content loss, the content loss describes the difference between the output new seed image and the first image in content, the texture similarity between the new seed image and the first image is represented, the smaller the value of the first loss is, the closer the content between the new seed image and the first image is, and the larger the first loss is, the larger the difference between the content between the new seed image and the first image is. The second loss is style loss, the style loss describes the difference between the output new seed image and the second image in style, the style similarity between the new seed image and the second image is represented, the smaller the value of the second loss is, the closer the style between the new seed image and the second image is represented, the Laplace transform calculation is added in the style loss, the matching effect of the style and the texture in the generated image (the output new seed image) is accurately evaluated by utilizing the Laplace transform relationship, and the style and the texture can be better fused. The third loss is a constraint loss used to evaluate the image quality of the new seed image output for each iteration.

The total loss is content _ loss + style _ loss + total _ variation _ loss. Specifically, the method comprises the following steps:

the first loss content _ loss is MSE (content _ layer, vars _ layer), and the calculation formula is: and performing MSE (mean square error) calculation on the feature map feature _ map output by the predefined variable layer and the feature map feature _ map output by the predefined invariable layer in the neural network to obtain a calculation result, wherein the calculation result is the value of content loss. Before this, some preset layers in the neural network are defined as variable layers, and some preset layers are defined as invariable layers, assuming that the image generation model adopts the neural network RESNET50, where the RESNET50 includes a plurality of Bottleneck modules bottleeck, in an alternative embodiment, RESNET50_ bottleeck _1_2, RESNET50_ bottleeck _2_3, RESNET50_ bottleeck _4_5 are defined as variable layers, and RESNET50_ bottleeck _3_4 is defined as invariable layers.

The second loss style _ loss is calculated as follows:

1) calculating a Laplace transform value of a new seed image obtained by current iteration, and recording the Laplace transform value as L;

2) calculating a feature map feature _ map output by each calculation layer predefined in the neural network and a gram matrix of a segmentation result of the style image (second image) to obtain a first calculation result, which is recorded as gram _ matrix _ const, wherein the segmentation result of the style image is calculated according to the style image and mask data of the style image, and the size of the feature map output by each calculation layer is consistent with the size of the image;

3) calculating a feature map feature _ map output by each calculation layer predefined in the neural network and a gram matrix of a segmentation result of the content image (first image) to obtain a second calculation result, and recording the second calculation result as gram _ matrix _ var, wherein the segmentation result of the content image is obtained by calculation according to the content image and mask data of the content image;

before the above 2) and 3), some preset layers in the neural network are defined as calculation layers for calculation of the loss value. Assuming that the image generation model employs the neural network RESNET50, RESNET50_ botteleck _1_2, RESNET50_ botteleck _2_3, and RESNET50_ botteleck _4_5 may be defined as calculation layers.

4) Performing MSE calculation on the first calculation result gram _ matrix _ const and the second calculation result gram _ matrix _ var to obtain a target array, and recording the target array as a style _ diff array;

5) summing the target array (style _ diff array) to obtain a summation result, and recording the summation result as SL;

6) and calculating the style _ loss as L + SL to obtain the value of the style _ loss.

The third loss total _ variation _ loss is calculated as follows: and calculating the total variation of the new seed image output by the current iteration of the image generation model, and multiplying the total variation by a preset weight constant to obtain the total _ variation _ loss.

The total variation J is calculated by the following formula:

J＝|y_i+1,j-y_i,j|+|y_i,j+1-y_i,j|；

wherein y refers to the pixel value of the image, i represents the abscissa of the pixel point, j represents the ordinate of the pixel point, y_i,_jAnd (3) representing the pixel value of a pixel point with coordinates (i, j) in the image. And after the total variation J is obtained through calculation, multiplying the total variation J by a preset weight constant to obtain a value of total _ variation _ loss.

Among the above loss calculations, the calculation of the gram matrix, the laplace transform value, and the MSE belongs to the conventional calculation, and will not be described herein.

The construction of the loss function is an important ring in image generation, and the difference of the loss function can cause the whole image generation model to be completely different from other existing models. The loss function constructed in the present embodiment combines the first loss, the second loss, and the third loss, restores the style of the second image while restoring the content of the first image, and enables the style to be accurately matched with the content. And then, gradually iterating by using a gradient descent method by taking the loss function as a target, and gradually optimizing the seed image in the image generation process.

Step 140: and iteratively executing a process of inputting the first image, the second image, the first mask data, the second mask data and the new seed image obtained by the last processing into the image generation model for processing, calculating corresponding loss based on the new seed image obtained by the current iteration, and updating the weight parameter in the image generation model based on the loss.

And inputting the new seed image obtained by the last processing, the first image, the second image, the first mask data and the second mask data which are obtained originally into the image generation model together, and processing again. In each iteration process, the image generation model outputs a new seed image, corresponding loss can be calculated again according to the output new seed image, then the weight parameter in the image generation model is updated again based on the calculated loss value, and the image generation model processes the image by using the updated weight parameter after the last iteration in the next iteration.

And repeating the step 140 continuously until the iteration end condition is met.

Step 150: and after the iteration is finished, selecting a new seed image from a plurality of new seed images obtained by multiple iterations according to the obtained loss, and taking the new seed image as a final target image.

The iteration end condition is, for example, to stop the iteration when the number of iterations reaches a set number. The set number of times is, for example, 2000, or 3000, or other values, and during the continuous iteration, new seed images with the same number of iterations are generated, and each new seed image corresponds to a loss value. In a specific embodiment, the selection requirements of the target image are: and determining a new seed image with the lowest loss value from a plurality of new seed images obtained by a plurality of iterations as a final target image. Fig. 2 shows a schematic diagram of a new seed image output by the image generation model when the iteration number is 100, and fig. 3 shows a schematic diagram of a new seed image output by the image generation model when the iteration number is 3100, and it can be seen that the new seed image continuously approaches to a desired synthesis effect as the iteration number increases.

Optionally, the above iteration ending condition does not exclude that: and stopping iteration when the calculated loss value is lower than a preset threshold value. The preset threshold value may be obtained from empirical values or from a number of experiments. It should be noted that the above threshold needs to be set reasonably, and if the threshold is set too small, the iteration stop condition may not be satisfied all the time, resulting in trapping in a dead loop.

Optionally, before step 120, that is, before the first image, the second image, the first mask data, the second mask data, and the raw seed image are input to the image generation model for processing, the method further includes: the sizes of the first image, the second image, the first mask data, the second mask data, and the original seed image are all set to the size required for input of the image generation model. After the sizes of the image and the mask data are set, inputting the first image, the second image, the first mask data, the second mask data and the original seed image with the set sizes into an image generation model for processing. In the subsequent iteration process, the first image, the second image, the first mask data, the second mask data and the original seed image with the set sizes are input into the image generation model for processing. By the above processing, it is ensured that the size of each input sample data coincides with the size required for input of the image generation model.

In an alternative embodiment, an adjustment layer may be added in front of the neural network of the image generation model, and the size setting of the image and the mask data is realized through the adjustment layer.

In summary, the image generation method provided by the embodiment of the present application uses color constraints and reasonable loss function design, so that image synthesis can be better achieved, and a high-quality target image can be obtained. In the image generation process, the neural network is utilized, the seed image is iterated continuously in a gradient descending mode, in the iteration process, the loss value for measuring the similarity between the seed image and the expected target is reduced continuously, and the purpose of image synthesis is finally achieved. Furthermore, the technical scheme is based on the texture of a content source and the color of a style source, and the blank original seed image is used for fusing the texture and the color of the style source, so that two images are combined into one image, in the process, the details of changing the texture are not involved, the color of the corresponding area is changed only based on the mask, for example, the door frame of a building is square, the color of the door frame is only changed, and the door frame is not bent.

Based on the same inventive concept, an embodiment of the present application further provides an image generating apparatus, please refer to fig. 4, the apparatus includes:

a data obtaining module 210, configured to obtain a first image and a second image, and first mask data and second mask data, where each mask value in the first mask data corresponds to a category to which a corresponding position on the first image belongs, and each mask value in the second mask data corresponds to a color of the corresponding position on the second image;

an image synthesis module 220, configured to input the first image, the second image, the first mask data, the second mask data, and the original seed image into an image generation model for processing, so as to obtain a processed new seed image; and for calculating a corresponding loss based on the new seed image and updating a weight parameter in the image generation model based on the loss; the process is used for inputting the first image, the second image, the first mask data, the second mask data and the new seed image obtained by the last processing into an image generation model for processing in an iteration mode, calculating corresponding loss based on the new seed image obtained by the current iteration, and updating the weight parameter in the image generation model based on the loss;

and the target determining module 230 is configured to, after the iteration is finished, select a new seed image from the multiple new seed images obtained through multiple iterations according to the obtained loss, and use the new seed image as a final target image.

Optionally, the second mask data is obtained by preprocessing a second image using a first image segmentation model, and the first image segmentation model is obtained by training an example segmentation network; the device also includes: the device comprises a first sample acquisition module, a second sample acquisition module and a comparison module, wherein the first sample acquisition module is used for acquiring a first training sample set for training, each training sample in the first training sample set is subjected to mask value labeling according to a preset color mapping table, and areas, positioned in the same color interval set in the color mapping table, of the training samples in the first training sample set are labeled with the same mask value; and the first training module is used for training the example segmentation network by using the first training sample set and obtaining the first image segmentation model after the training is finished.

Optionally, the first mask data is obtained by preprocessing a first image by using a second image segmentation model, and the second image segmentation model is obtained by training an example segmentation network; the device also includes: a second sample acquisition module, configured to acquire a second training sample set used for training, where each training sample in the second training sample set is subjected to mask value labeling, and regions belonging to the same class on the training samples in the second training sample set are labeled with the same mask value; and the second training module is used for training the example segmentation network by using the second training sample set and obtaining the second image segmentation model after the training is finished.

Optionally, the apparatus further comprises: a size adjustment module, configured to set sizes of the first image, the second image, the first mask data, the second mask data, and the original seed image to a size required for input of the image generation model; the image synthesis module 220 is specifically configured to input the first image, the second image, the first mask data, the second mask data, and the original seed image after the size setting to the image generation model for processing.

Optionally, the target determining module 230 is specifically configured to determine, as a final target image, a new seed image with a lowest loss value from multiple new seed images obtained through multiple iterations.

Optionally, the image synthesizing module 220 is further specifically configured to: respectively calculating a first loss, a second loss and a third loss based on the new seed image, and accumulating the first loss, the second loss and the third loss to obtain the loss corresponding to the new seed image; wherein the first loss is a loss of the new seed image in content based on the first image, the second loss is a loss of the new seed image in genre based on the second image, and the third loss is a loss of image quality based on the new seed image.

The image generating apparatus provided above has the same basic principle and technical effect as those of the previous method embodiment, and for the sake of brief description, corresponding contents in the above method embodiment may be referred to where not mentioned in this embodiment, and are not described herein again.

Fig. 5 shows a possible structure of an electronic device 300 provided in an embodiment of the present application. Referring to fig. 5, the electronic device 300 includes: a processor 310, a memory 320, and a communication interface 330, which are interconnected and in communication with each other via a communication bus 340 and/or other form of connection mechanism (not shown).

The Memory 320 includes one or more (Only one is shown in the figure), which may be, but not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Read Only Memory (EPROM), an electrically Erasable Read Only Memory (EEPROM), and the like. The processor 310, as well as possibly other components, may access, read, and/or write data to the memory 320.

The processor 310 includes one or more (only one shown) which may be an integrated circuit chip having signal processing capabilities. The Processor 310 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Micro Control Unit (MCU), a Network Processor (NP), or other conventional processors; or a special-purpose Processor, including a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, and a discrete hardware component.

Communication interface 330 includes one or more (only one shown) that may be used to communicate directly or indirectly with other devices for the purpose of data interaction. Communication interface 330 may be an ethernet interface; may be a mobile communications network interface, such as an interface for a 3G, 4G, 5G network; or may be other types of interfaces having data transceiving functions.

One or more computer program instructions may be stored in the memory 320 and read and executed by the processor 310 to implement the steps of the image generation method provided by the embodiments of the present application and other desired functions.

It will be appreciated that the configuration shown in fig. 5 is merely illustrative and that electronic device 300 may include more or fewer components than shown in fig. 5 or have a different configuration than shown in fig. 5. The components shown in fig. 5 may be implemented in hardware, software, or a combination thereof.

The embodiment of the present application further provides a computer-readable storage medium, where computer program instructions are stored on the computer-readable storage medium, and when the computer program instructions are read and executed by a processor of a computer, the steps of the image generation method provided in the embodiment of the present application are executed. The computer-readable storage medium may be implemented as, for example, memory 320 in electronic device 300 in fig. 5.

The embodiment of the present application further provides a computer program product, which when running on a computer, causes the computer to execute the image generation method provided by the present embodiment.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

It should be noted that the functions, if implemented in the form of software functional modules and sold or used as independent products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. An image generation method, characterized in that the method comprises:

acquiring a first image, a second image, first mask data and second mask data, wherein each mask value in the first mask data corresponds to a category to which a corresponding position on the first image belongs, and each mask value in the second mask data corresponds to a color of the corresponding position on the second image;

inputting the first image, the second image, the first mask data, the second mask data and the original seed image into an image generation model for processing to obtain a processed new seed image;

calculating corresponding loss based on the new seed image, and updating a weight parameter in the image generation model based on the loss;

iteratively executing a process of inputting the first image, the second image, the first mask data, the second mask data and a new seed image obtained by last processing into an image generation model for processing, calculating corresponding loss based on the new seed image obtained by current iteration, and updating a weight parameter in the image generation model based on the loss;

after iteration is finished, selecting a new seed image from a plurality of new seed images obtained by multiple iterations according to obtained loss, and taking the new seed image as a final target image;

the second mask data is obtained by preprocessing a second image by utilizing a first image segmentation model, and the first image segmentation model is obtained by training an example segmentation network;

the method further comprises the following steps:

acquiring a first training sample set for training, wherein each training sample in the first training sample set is subjected to mask value labeling according to a preset color mapping table, and areas, located in the same color interval set in the color mapping table, of the training samples in the first training sample set are labeled with the same mask value;

and training an example segmentation network by using the first training sample set, and obtaining the first image segmentation model after the training is finished.

2. The method of claim 1, wherein the first mask data is obtained by pre-processing the first image using a second image segmentation model, the second image segmentation model being obtained by training an example segmentation network;

the method further comprises the following steps:

acquiring a second training sample set for training, wherein each training sample in the second training sample set is subjected to mask value labeling, and regions belonging to the same class on the training samples in the second training sample set are labeled with the same mask value;

and training the example segmentation network by using the second training sample set, and obtaining the second image segmentation model after the training is finished.

3. The method of claim 1, wherein prior to inputting the first image, the second image, the first mask data, the second mask data, and a raw seed image into an image generation model for processing, the method further comprises:

setting the sizes of the first image, the second image, the first mask data, the second mask data and the original seed image to be the sizes required by the input of the image generation model;

the inputting the first image, the second image, the first mask data, the second mask data and the original seed image into an image generation model for processing includes: inputting the first image, the second image, the first mask data, the second mask data and the original seed image with the set sizes into an image generation model for processing.

4. The method according to claim 1, wherein selecting a new seed image from a plurality of new seed images obtained from a plurality of iterations according to the obtained loss as the final target image comprises:

and determining a new seed image with the lowest loss value from a plurality of new seed images obtained by a plurality of iterations as a final target image.

5. The method of claim 1, wherein calculating the corresponding loss based on the new seed image comprises:

respectively calculating a first loss, a second loss and a third loss based on the new seed image, and accumulating the first loss, the second loss and the third loss to obtain the loss corresponding to the new seed image; wherein the first loss is a loss of the new seed image in content based on the first image, the second loss is a loss of the new seed image in genre based on the second image, and the third loss is a loss of image quality based on the new seed image.

6. The method of claim 1, wherein prior to inputting the first image, the second image, the first mask data, the second mask data, and a raw seed image into an image generation model for processing, the method further comprises:

and generating the original seed image by using a random number, wherein the original seed image is a noise image, a white image or a black image.

7. An image generation apparatus, comprising:

the data acquisition module is used for acquiring a first image, a second image, first mask data and second mask data, wherein each mask value in the first mask data corresponds to a category to which a corresponding position on the first image belongs, and each mask value in the second mask data corresponds to a color of the corresponding position on the second image;

the image synthesis module is used for inputting the first image, the second image, the first mask data, the second mask data and the original seed image into an image generation model for processing to obtain a processed new seed image; and for calculating a corresponding loss based on the new seed image and updating a weight parameter in the image generation model based on the loss; the process is used for inputting the first image, the second image, the first mask data, the second mask data and the new seed image obtained by the last processing into an image generation model for processing in an iteration mode, calculating corresponding loss based on the new seed image obtained by the current iteration, and updating the weight parameter in the image generation model based on the loss;

the target determining module is used for selecting a new seed image from a plurality of new seed images obtained by multiple iterations according to the obtained loss after the iteration is finished, and taking the new seed image as a final target image;

the second mask data is obtained by preprocessing a second image by utilizing a first image segmentation model, and the first image segmentation model is obtained by training an example segmentation network; the device further comprises:

the device comprises a first sample acquisition module, a second sample acquisition module and a comparison module, wherein the first sample acquisition module is used for acquiring a first training sample set for training, each training sample in the first training sample set is subjected to mask value labeling according to a preset color mapping table, and areas, positioned in the same color interval set in the color mapping table, of the training samples in the first training sample set are labeled with the same mask value;

and the first training module is used for training the example segmentation network by using the first training sample set and obtaining the first image segmentation model after the training is finished.

8. A storage medium, characterized in that the storage medium has stored thereon a computer program which, when being executed by a processor, performs the method according to any one of claims 1-6.

9. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the method of any of claims 1-6.