CN112967174B

CN112967174B - Image generation model training, image generation method, image generation device and storage medium

Info

Publication number: CN112967174B
Application number: CN202110084071.4A
Authority: CN
Inventors: 方慕园; 张雷; 万鹏飞
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-01-21
Filing date: 2021-01-21
Publication date: 2024-02-09
Anticipated expiration: 2041-01-21
Also published as: CN112967174A

Abstract

The present disclosure relates to a training method of an image generation model, an image generation method and a device, wherein the training method of the image generation model comprises the steps of obtaining at least two sample image sets; acquiring an image generation model and a discriminator corresponding to each image style, wherein the model comprises an image generation network and a style vector corresponding to each image style; inputting the sample image and each style vector into an image generation network to obtain a stylized image corresponding to each style vector; inputting each stylized image into a corresponding discriminator to obtain a first discriminator loss and a second discriminator loss; obtaining an image generation loss according to the sample image, the stylized image and the second discriminator loss; and training the corresponding discriminators according to the first discriminators loss, and generating a loss training image according to the image generation network and each style vector. The method and the device can complete multiple image style conversion based on a single network, occupy less computing resources and storage resources, and have stronger detail holding capability.

Description

Image generation model training, image generation method, image generation device and storage medium

Technical Field

The present disclosure relates to the field of computer vision, and in particular, to image generation model training, an image generation method, an image generation device, and a storage medium.

Background

In the related art, an input image can be fitted through a neural network, so that a target image with changed style is obtained. For example, a single encoder may be used to extract features of the input image, and the extracted features may then be input to a decoder to obtain the stylistic changed target image. However, a decoder is typically only able to generate one style of image. For another example, a neural network may be used for picture style conversion, but one conversion approach requires a corresponding neural network. Therefore, if it is necessary to obtain images of various styles, a plurality of decoders or a plurality of neural networks are required, and it is obvious that a large amount of hardware resources are required.

Disclosure of Invention

The present disclosure provides image generation model training, image generation methods, apparatuses, and storage media to at least solve the problem in the related art that multiple decoders or multiple neural networks are required if multiple styles of images are desired. The technical scheme of the present disclosure is as follows:

according to a first aspect of an embodiment of the present disclosure, there is provided a training method of an image generation model, including:

Acquiring at least two sample image sets, each sample image set having a different image style;

acquiring an image generation model to be trained and a discriminator corresponding to each image style, wherein the image generation model comprises an image generation network and a style vector corresponding to each image style; the style vector is used for triggering the image generation network to generate a stylized image with a corresponding image style;

inputting sample images in each sample image set and each style vector into the image generation network to obtain a stylized image corresponding to each style vector;

inputting each stylized image into a corresponding discriminator to obtain a first discriminator loss and a second discriminator loss generated by the discriminator; the first discriminator loss characterizes a discrimination accuracy loss of the discriminator, and the second discriminator loss characterizes an image generation accuracy loss of the image generation network relative to the discriminator;

obtaining an image generation loss generated by the image generation network according to the sample image, each stylized image and the second discriminator loss corresponding to each discriminator;

Training a corresponding arbiter according to the first arbiter loss, and training the image generation network and each of the style vectors according to the image generation loss;

and determining the trained image generation network and each trained style vector as the image generation model.

In an exemplary embodiment, the inputting each stylized image into a corresponding arbiter, to obtain a first arbiter loss and a second arbiter loss generated by the arbiter, where the first arbiter loss characterizes a loss of discrimination precision of the arbiter, and the second arbiter loss characterizes a loss of image generation precision of the image generation network relative to the arbiter; comprising the following steps:

inputting each stylized image into a corresponding discriminator to obtain a discrimination result generated by the discriminator; the discriminator corresponds to a style vector corresponding to the stylized image;

and calculating the first discriminator loss according to the difference value between the discrimination result and the true value of the stylized image.

In an exemplary embodiment, the inputting each stylized image into a corresponding arbiter, to obtain a first arbiter loss and a second arbiter loss generated by the arbiter, where the first arbiter loss characterizes a loss of discrimination precision of the arbiter, and the second arbiter loss characterizes a loss of image generation precision of the image generation network relative to the arbiter, and further includes:

And calculating the second discriminant loss according to the difference between the discrimination result and the expected value of the stylized image.

In an exemplary embodiment, the obtaining the image generation loss generated by the image generation network according to the sample image, each stylized image, and the second discriminator loss corresponding to each discriminator includes:

according to the high-frequency components of any two stylized images, calculating to obtain high-frequency loss; the arbitrary two stylized images are generated based on the same sample image;

determining a target stylized image in each stylized image, wherein the target stylized image is generated by the image generation network according to a style vector corresponding to the image style of the sample image and the sample image;

calculating to obtain consistency loss according to pixel differences between the target stylized image and the sample image;

and obtaining the image generation loss according to the high-frequency loss, the consistency loss and the second discriminator loss corresponding to each discriminator.

In an exemplary embodiment, the method further comprises:

training to obtain the image generation model when the image generation loss is smaller than a preset loss threshold value;

Or alternatively, the first and second heat exchangers may be,

and when the training times of the training target are larger than a preset training times threshold value, obtaining the image generation model, wherein the training target comprises one or more of the discriminator, the image generation network and the style vector.

According to a second aspect of the embodiments of the present disclosure, there is provided an image generating method, including:

acquiring an original image and a target style, wherein the target style represents a style corresponding to a target image to be generated;

according to the target style, determining a corresponding target style vector in an image generation model;

inputting the original image and the target style vector into an image generation network in the image generation model to obtain a target image with the target style;

wherein the image generation model is obtained according to the training method of the image generation model according to any one of the above first aspects.

According to a third aspect of embodiments of the present disclosure, there is provided an image generation model training apparatus, including:

a sample image set determination module configured to perform acquiring at least two sample image sets, each of the sample image sets having a different image style;

a training object acquisition module configured to perform acquisition of an image generation model to be trained and a discriminator corresponding to each of the image styles, the image generation model including an image generation network and a style vector corresponding to each of the image styles; the style vector is used for triggering the image generation network to generate a stylized image with a corresponding image style;

The image generation module is configured to execute the steps of inputting the sample images in each sample image set and each style vector into the image generation network to obtain a stylized image corresponding to each style vector;

a first loss calculation module configured to perform inputting each of the stylized images into a corresponding discriminator, resulting in a first discriminator loss and a second discriminator loss generated by the discriminators; the first discriminator loss characterizes a discrimination accuracy loss of the discriminator, and the second discriminator loss characterizes an image generation accuracy loss of the image generation network relative to the discriminator;

a second loss calculation module configured to perform image generation loss generated by the image generation network according to the sample image, each of the stylized images, and the second discriminator loss corresponding to each of the discriminators;

a training module configured to perform training of a corresponding arbiter based on the first arbiter loss, and training the image generation network and each of the style vectors based on the image generation loss; and determining the trained image generation network and each trained style vector as the image generation model.

In an exemplary embodiment, the first loss calculation module is configured to perform inputting each stylized image into a corresponding discriminator to obtain a discrimination result generated by the discriminator; the discriminator corresponds to a style vector corresponding to the stylized image; and calculating the first discriminator loss according to the difference value between the discrimination result and the true value of the stylized image.

In an exemplary embodiment, the first loss calculation module is configured to calculate the second discriminator loss according to a difference between the discrimination result and an expected value of the stylized image.

In an exemplary embodiment, the second loss calculation module includes:

a high-frequency loss calculation unit configured to perform calculation of a high-frequency loss from high-frequency components of any two stylized images; the arbitrary two stylized images are generated based on the same sample image;

a target stylized image determination unit configured to perform determination of a target stylized image among the respective stylized images, the target stylized image being generated by the image generation network from a style vector corresponding to an image style of the sample image and the sample image;

A consistency loss calculation unit configured to perform calculation of a consistency loss from pixel differences between the target stylized image and the sample image;

an image generation loss calculation unit configured to perform obtaining the image generation loss from the high frequency loss, the coincidence loss, and the second discriminator loss corresponding to each of the discriminators.

In an exemplary embodiment, the apparatus further comprises a training control module configured to perform training to obtain the image generation model when the image generation loss is less than a preset loss threshold; or when the training times of the training targets are larger than a preset training times threshold value, obtaining the image generation model, wherein the training targets comprise one or more of the discriminator, the image generation network or the style vector.

According to a fourth aspect of the embodiments of the present disclosure, there is provided an image generating apparatus including:

the generation element acquisition module is configured to acquire an original image and a target style, wherein the target style represents a style corresponding to a target image to be generated;

a target style vector acquisition module configured to perform determining a corresponding target style vector in an image generation model according to the target style;

A target image acquisition module configured to perform an image generation network that inputs the original image and the target style vector into the image generation model, resulting in a target image having the target style;

According to a fifth aspect of embodiments of the present disclosure, there is provided an electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the training method of the image generation model as described in any one of the above first aspects or the image generation method as described in the above second aspects.

According to a sixth aspect of embodiments of the present disclosure, there is provided a computer readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the training method of the image generation model as set forth in any one of the above-mentioned first aspects or the image generation method as set forth in the above-mentioned second aspects.

According to a seventh aspect of embodiments of the present disclosure, there is provided a computer program product comprising computer instructions stored in a computer readable storage medium. The computer instructions are read from a computer-readable storage medium by a processor of an electronic device, which executes the computer instructions, causing the electronic device to perform the training method of the image generation model as described in any one of the above first aspects or the image generation method as described in the above second aspect.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

according to the method and the device, the image generation model can be trained, so that the trained image generation model can output stylized images with different image styles, and for the same input image, the image generation model can generate corresponding stylized images based on different style vectors, so that the task of converting multiple styles is completed, the purpose of converting multiple image styles by using a single network is achieved, the occupied computing resources and storage resources are fewer, and the detail holding capacity is stronger.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.

FIG. 1 is a schematic diagram illustrating a style conversion scheme in the related art according to an exemplary embodiment;

FIG. 2 is a schematic diagram illustrating another style conversion scheme in the related art according to an exemplary embodiment;

FIG. 3 is an application environment diagram illustrating an image generation method according to an exemplary embodiment;

FIG. 4 is a flowchart illustrating a training method for an image generation model, according to an exemplary embodiment;

FIG. 5 is a flow chart illustrating inputting each of the stylized images into a corresponding arbiter resulting in a first arbiter penalty and a second arbiter penalty generated by the arbiter, according to an exemplary embodiment;

FIG. 6 is a flow chart illustrating obtaining image generation loss generated by the image generation network based on the sample image, each of the stylized images, and the second discriminator loss corresponding to each of the discriminators, according to an exemplary embodiment;

FIG. 7 is a graph illustrating high frequency components from any two stylized images calculated as high frequency losses according to an exemplary embodiment; a flow chart in which any two stylized images are generated based on the same sample image;

FIG. 8 is a flowchart illustrating an image generation method according to an exemplary embodiment;

FIG. 9 is a schematic diagram illustrating an image generation method according to an exemplary embodiment;

FIG. 10 is a block diagram of an image generation model training apparatus, according to an example embodiment;

FIG. 11 is a block diagram of an image generation apparatus according to an exemplary embodiment;

fig. 12 is a block diagram of an electronic device, according to an example embodiment.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

The style conversion can change the details and styles of the images under the condition of keeping the main content of the images, such as the age change of the images, the skin color and the color of the images are changed, the images are converted into the animation (two-dimensional) style, the style conversion can be realized through a neural network in the related technology, the generated images after the style conversion are unchanged from the general details of the images before the style conversion, such as the facial form of the person, and the facial contours are basically unchanged before and after the conversion.

Referring to fig. 1, a schematic diagram of a style conversion scheme in the related art is shown, in which an image of style a is converted into an image of style B by a neural network 1, and the image of style B is converted into the image of style a by a neural network 2.

Referring to fig. 2, a schematic diagram of another style conversion scheme in the related art is shown, in which feature extraction is performed on various style images, an image with a style a is output through the neural network 3, and an image with a style B is output through the neural network 4.

The style conversion schemes shown in fig. 1 and fig. 2 require a corresponding neural network, if multiple style conversions are performed, it is obvious that multiple neural networks are required, which occupies resources, and wastes both storage resources and computing resources, and the effect of style conversion in the related art on maintaining the image texture details is still to be further improved.

In order to generate images with multiple image styles based on one neural network, save resources and maintain consistency of image texture details to a large extent, the present disclosure provides a training method of an image generation model, and an image generation method based on the image generation model.

Referring to fig. 3, an application environment diagram of an image generation method is shown, which may include a terminal 110 and a server 120, according to an exemplary embodiment.

The terminal 110 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, etc. The terminal 110 may have a client running therein that is served by the server 120 in the background. The client may send the original image and the image generation style to the server 120, and acquire and display the target image with the image generation style returned by the server 120. For example, the client may capture a face image and determine a quadratic style, send the face image and the quadratic style to the server 120, and obtain the face image with the quadratic style returned by the server 120.

The server 120 may obtain a target image according to the original image and the image generation style transmitted by the client 110 and relying on the image generation model, and send the target image to the client 110. The server 120 may also be used to train the image generation model and save the trained image generation model.

The server 120 shown in fig. 3 may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, and the terminal 110 and the server 120 may be connected through a wired network or a wireless network.

Fig. 4 is a flowchart illustrating a training method of an image generation model according to an exemplary embodiment, and as shown in fig. 4, an explanation is made by applying the training method of the image generation model to the server 120 shown in fig. 1, including the following steps.

In step S10, at least two sample image sets are acquired, each of the sample image sets having a different image style.

The specific content of the image style is not limited in this disclosure. By way of example, the image style may be an artistic style, such as an artistic genre style, an artistic era style, a custom style, and the like. For example, an artistic genre style may include impression genres, abstract genres, pictorial genres, and the like; the art style may include a traditional classical style, a modern pictorial style, and the like. The image style can also comprise expression styles such as cartoon style, sketch style, oil painting style, traditional Chinese painting style, color lead style and the like. For example, the image style may also characterize some fixed change to an element of the image, such as changing the color of a face in the image to gray or changing hair in the image to light.

In the examples below, the technical solutions shown in the present disclosure are illustrated by taking two image styles, namely, image style a and image style B.

For each image style, a sample image set may be determined, the sample images in the sample image set having the image style.

For the image style a, a sample image set 1 is determined, and the image style of each sample graph in the sample image set 1 is a. For image style B, sample image set 2 is determined, and the image style of each sample graphic in sample image set 2 is B.

In step S20, acquiring an image generation model to be trained and a discriminator corresponding to each image style, where the image generation model includes an image generation network and a style vector corresponding to each image style; the style vector is used for triggering the image generation network to generate a stylized image with a corresponding image style.

Illustratively, for image style A, it corresponds to a style vectorA discriminator->For image style B, it corresponds to style vector +.>A discriminator->The style vector->And style vector->Are part of the image generation model and the image generation network may be used to trigger the generation of stylized images corresponding to either image style a or image style B.

The generative antagonism network model (GAN, generative Adversarial Networks) is a deep learning model. The generated countermeasure network Model achieves the training purpose through mutual game learning between a generated Model (generated Model) and a discriminant Model (Discriminative Model) in the framework, so that the performance of the generated Model is optimized. In the original generation type countermeasure network model theory, the generation model and the discrimination model are not required to be neural networks, only the functions which can be correspondingly generated and discriminated are required to be fitted, and the deep neural network can be used as the generation model and the discrimination model in the actual use process. Of course, in some scenarios, CNN (Convolutional Neural Networks, convolutional neural network), or RNN (Recurrent Neural Network ) may also be used.

In fact, the specific structure of the generative model and the discriminant model can be varied as long as the functions generated and discriminant accordingly can be fitted. In the embodiment of the disclosure, the image generating model is the generating model, and the discriminant corresponding to each image style is the discriminating model.

In an exemplary embodiment, the discriminant in step S20 may be obtained based on a discriminant model of a generative countermeasure network model, and any one of a deep neural network, a convolutional neural network, and a recurrent neural network may be used. The discriminators corresponding to different image styles may use the same or different neural networks, and if the same neural network is used, the detail structures of the neural networks may be the same or different, which is not limited by the present disclosure.

Exemplary, discriminantFor judging whether the image is generated by the image generating network, discriminator +.>The loss generated by the discrimination result of the image which is the style A and is true and is not obtained by the image generating network is smaller, and the loss generated by the discrimination result of the image which is not the style A is larger; correspondingly, discriminator->Also for determining whether an image is generated by an image generation network, discriminator +. >The loss generated by the discrimination result for the image of the style B and the image which is true and is not obtained by the image generation network is small, and the loss generated by the discrimination result for the image of the style other than B is large.

In step S30, the sample images in each sample image set and the respective style vectors are input to the image generation network, and a stylized image corresponding to each style vector is obtained.

In the embodiment of the present disclosure, the image generating network in step S30 may be obtained based on the generating model of the above-described generating type countermeasure network model, and any one of a deep neural network, a convolutional neural network, and a recurrent neural network may be used. The structure of the image generation network and the respective discriminators may be the same or different, and this disclosure is not limited thereto. Even though the image generation network uses the same neural network as any of the above-described discriminators, the detailed structure of the specific neural network may be the same or different, which is not limited by the present disclosure.

Taking any one sample image P in the sample image set 1 corresponding to the image style A as an example, the sample image P and the style vector are taken as the examplesAnd style vector->Inputting the image generation network to obtain an image P output by the image generation network _A And image P _B Wherein the image P _A Corresponding to style vector->Image P _B Corresponding to style vector->

Likewise, any one of the sample image sets 2 corresponding to the image style BZhang Yangben for example, sample image Q and style vectorAnd style vector->Inputting the image generating network to obtain an image Q outputted by the image generating network _A Sum image Q _B Wherein image Q _A Corresponding style vector->Image Q _B Corresponding style vector->

In step S40, inputting each stylized image into a corresponding discriminator to obtain a first discriminator loss and a second discriminator loss generated by the discriminator; the first discriminator loss characterizes a discrimination accuracy loss of the discriminator, and the second discriminator loss characterizes an image generation accuracy loss of the image generation network with respect to the discriminator.

Exemplary, image P _A ，Q _A Input to a discriminatorImage P _B ，Q _B Input to the discriminator->Input of an image with a style vector A outputted by an image generating network into a discriminator +.>Can obtain the discriminator->The first and second discriminant losses generated; similarly, by combining the figuresImage input discriminator with style vector B output by image generation network Can obtain the discriminator->The first and second discriminant losses that result.

In the embodiment of the disclosure, the first discriminator loss and the second discriminator loss are calculated according to the discrimination result output by the discriminator. Wherein the first discriminator loss characterizes a discrimination accuracy loss of the discriminator, and the higher the accuracy of the discriminator identification, the smaller the first discriminator loss. Illustratively, if the discrimination output by the discriminator is 0.6 and the accurate discrimination is 0, the first discriminator loss is 0.6.

The second discriminator loss characterizes an image generation accuracy loss of the image generation network with respect to the discriminator, and a lower second discriminator loss indicates that the image generated by the image generation network is easier to fool the discriminator, that is, the higher the image generation accuracy of the image generation network with respect to the discriminator is, and accordingly, the lower the image generation accuracy loss of the image generation network with respect to the discriminator is. Illustratively, if the discrimination result output by the discriminator is 0.6 and the discrimination result expected by the image generation network is 1, the second discriminator loss is 0.4.

In one embodiment, please refer to fig. 5, fig. 5 is a flowchart illustrating inputting each of the stylized images into a corresponding discriminator to obtain a first discriminator loss and a second discriminator loss generated by the discriminator, according to an exemplary embodiment, which includes:

In step S41, inputting each stylized image into a corresponding discriminator to obtain a discrimination result generated by the discriminator; the discriminator corresponds to a style vector corresponding to the stylized image.

In step S42, the first discriminator loss is calculated from the difference between the discrimination result and the actual value of the stylized image.

Exemplary, to put the image P _A Input discriminatorFor example, if P _A Input discriminator->The obtained discrimination result is 0.6, which characterizes the discriminator +.>Consider image P _A The probability of being a real image (in this disclosure, an image that is not generated by the image generation network is a real image) is 0.6.

In fact, image P _A Is an image generated by the image generation network, and therefore, the image P _A The first discriminator loss is 0.6 because the true value of (if the image is not generated by the image generating network) is 0 (the true value is 1, and vice versa is 0).

The performance of the discriminator can be accurately evaluated by calculating the first discriminator loss, so that the parameters of the corresponding discriminator can be adjusted according to the first discriminator loss, and the discriminating capacity of the discriminator can be improved.

In step S43, the second discriminator loss is calculated from the difference between the discrimination result and the expected value of the stylized image.

The waiting value in the embodiment of the disclosure can be used for representing the fidelity degree of the image generated by the image generation network, and also embody the training purpose of the image generation network, namely, the generated image can be a spurious and true masking discriminator.

In fact, image P _A Is an image generated by the image generation network, and therefore, the image P _A Since the expected value of (1 if the image is generated by the image generation network and 0 if the expected value is 1, the second discriminator loss is 0.4).

The second discriminator loss generated by calculating the discriminator can accurately evaluate the precision loss of the image generated by the image generation network from the angle of the discriminator, and the second discriminator loss is applied to the image of the training image generation network, so that the purposes of mutual antagonism and common optimization of the discriminator and the image network are realized.

In step S50, an image generation loss generated by the image generation network is obtained from the sample image, each of the stylized images, and the second discriminator loss corresponding to each of the discriminators.

In one embodiment, referring to fig. 6, fig. 6 is a flowchart illustrating an image generation loss generated by the image generation network according to the sample image, each of the stylized images, and the second discriminator loss corresponding to each of the discriminators according to an exemplary embodiment, where the flowchart includes:

in step S51, a high-frequency loss is calculated according to the high-frequency components of any two stylized images; any two stylized images described above are generated based on the same sample image.

Referring to fig. 7, fig. 7 illustrates a calculation of high frequency loss from high frequency components of any two stylized images according to an exemplary embodiment; the flow chart generated by any two stylized images based on the same sample image comprises the following steps:

in step S511, the high-frequency component of each stylized image is calculated.

For each stylized image, the corresponding blurred image of the stylized image may be obtained by blurring the stylized image according to a preset blurring parameter, and the high-frequency component of the stylized image may be obtained by differentiating the stylized image from the blurred image. In stylized image P _A For example, for stylized image P _A Performing fuzzy processing to obtain P _A ' P is to _A -P _A ' as the high frequency component.

The present disclosure is not limited to specific values of the blurring parameters and specific blurring methods, and a user may set according to actual situations.

In step S512, for any two high frequency components, a first statistical value is calculated from the two high frequency components.

In the present disclosure, the first statistical value may be MSE (Mean Square Error ), RMSE (Root Mean Square Error, root mean square error), MAE (Mean Absolute Error ), which is not limited to a specific form of the first statistical value.

In step S513, the sum of the calculated first statistics is determined as the high-frequency loss.

According to the method and the device, the loss generated by each stylized image in the high-frequency component dimension can be accurately measured by calculating the loss, and the calculation accuracy of the image generation loss is improved.

In step S52, a target stylized image is determined among the respective stylized images, and the target stylized image is generated by the image generation network from the sample image and a style vector corresponding to the image style of the sample image.

For example, for sample image P from sample image set 1, whose image style is A, the target style vector isStylized image P generated from sample image P _A The corresponding style vector is +.>Obviously, stylized image P _A The corresponding style vector is the target style vector +.>Thus, for the sample image P, the stylized image P _A I.e. the target stylized image.

In step S53, a consistency loss is calculated from the pixel difference between the target stylized image and the sample image.

In embodiments of the present disclosure, the consistency penalty may be the target stylized image P _A A second statistic of pixel differences from the sample image P. In the present disclosure, the second statistical value may be MSE (Mean Square Error ), RMSE (Root Mean Square Error, root mean square error), MAE (Mean Absolute Error ), which is not limited to the specific form of the first statistical value.

According to the method and the device, the consistency loss can be accurately measured by calculating the second statistical value of the target stylized image and the sample image in the pixel difference dimension, and the calculation accuracy of the image generation loss is improved.

In step S54, the image generation loss is obtained from the high frequency loss, the coincidence loss, and the second discriminator loss corresponding to each of the discriminators.

In one embodiment, the high frequency loss, the consistency loss, and the second discriminator loss corresponding to each of the discriminators may be weighted and summed to obtain an image generation loss. The present disclosure is not limited to specific numerical values of the weights, and may be set according to actual requirements.

The method and the system accurately calculate the image generation loss through the high-frequency loss, the consistency loss and the second discriminant loss corresponding to each discriminant, so that the performance of an image generation network and each style vector can be accurately evaluated, the parameters of the image generation network and each style vector can be adjusted according to the image generation loss, and the image generation capacity is improved.

In step S60, the corresponding discriminators are trained based on the first discriminators loss, and the image generation network and the respective style vectors are trained based on the image generation loss.

In the present disclosure, the first discriminator loss is reduced by adjusting parameters of the discriminators, the image generation loss is reduced by adjusting parameters of the image generation network and the respective style vectors, that is, the first discriminator loss corresponding to each discriminator in the respective discriminators is reduced by adjusting the respective discriminator parameters based on the gradient descent concept, and the image generation loss is reduced by adjusting the parameters of the image generation network and the respective style vectors.

Parameters of each arbiter, parameters of the image generation network, and each style vector may be adjusted based on a gradient descent method in the present disclosure. The gradient descent method is an iterative method that can be used to solve the least squares problem (both linear and nonlinear). In the application of optimizing parameters of a training neural network, namely solving the unconstrained optimization problem, a gradient descent method is one of the most commonly adopted methods. When the minimum value of the loss function is solved, the minimum loss function can be obtained through one-step iterative solution by a gradient descent method, and parameters of the neural network are optimized.

Specifically, GD (Gradient Descent algorithm), BGD (batch Gradient Descent algorithm, batch Gradient Descent), SGD (random Gradient Descent algorithm, stochastic Gradient Descent), or MBGD (small batch Gradient Descent algorithm, mini-Batch Gradient Descent) may be used, which is not limited by the present disclosure.

In one embodiment, the number of exercises of the exercise target may be counted, where the exercise target includes any one or more of a arbiter, an image generation network, or a style vector; and if the training times are greater than a preset training times threshold, judging that the training stopping condition is reached. Illustratively, the training stop condition in this embodiment may be: the training times of the parameters of any one of the discriminators reach a first preset times, or the training times of the parameters of the image generation network reach a second preset times, and the first preset times and the second preset times may be equal and may be set according to actual demands of users, and the present disclosure is not limited to specific numerical values thereof.

In another embodiment, the training stop condition may be that the image generation loss is smaller than a preset loss threshold, and the preset loss threshold may be set according to the actual requirement of the user, and the disclosure is not limited to specific values thereof.

In step 70, the trained image generation network and the trained respective style vectors are determined as the image generation model.

According to the embodiment of the disclosure, the image generation model can be trained, so that the trained image generation model can output stylized images with different image styles, and for the same input image, the image generation model can generate the stylized images with different styles based on different style vectors, so that the task of converting multiple styles is completed, the purpose of converting multiple image styles by using a single network is achieved, the occupied computing resources and the storage resources are fewer, and the detail holding capability is stronger.

According to the embodiment of the disclosure, the training of the image generation model is performed based on the countermeasure idea, and after the training of the image generation model is finished, the discriminator has difficulty in discriminating the authenticity of the image generated by the image generation model, so that the image generated by the image generation model can be approximate to false, the training method is less influenced by the image quality in the sample image set, and the image details, the image contour and the image texture of the sample image in the sample image set can be reserved to a great extent in the generated image. In contrast to this, the neural network for performing image style conversion in the related art is usually not obtained based on the countermeasure training, and is greatly affected by the quality of the sample image, but in the process of preparing the sample image, it is difficult to find a sample image with different styles and completely consistent content, that is, the quality of the sample image is unstable, so that the neural network for performing image conversion in the related art has limited effects of maintaining the details, contours and textures of the image, and the converted image has a certain degree of distortion, and therefore, the quality of performing image conversion by the image generation model trained by the embodiment of the disclosure is improved compared with the neural network for performing image conversion in the related art.

Based on the above trained image generation model, the present disclosure further illustrates an image generation method, as illustrated in fig. 8, fig. 8 is a flowchart of the image generation method illustrated according to an exemplary embodiment, the method including:

in step S10-1, an original image and a target style representing a style corresponding to a target image to be generated are acquired.

For example, if a sample image set corresponding to a quadratic element style, a sample image set corresponding to a sketch style, a sample image set corresponding to a oilpainting style, and a sample image set corresponding to a sticker style are used for training in the training process of the image generation model, the image generation model may support conversion of an input original image into the quadratic element style, the sketch style, the oilpainting style, or the sticker style. The image generation model comprises an image generation network, and a two-dimensional style vector, a sketch style vector, a oiled painting style vector and a sticker style vector. Accordingly, in step S10-1, the target style may be any one of a two-dimensional style, a sketch style, a canvas style, or a sticker style.

In some embodiments, the image style selection interface may be generated according to various styles that can be supported by the image generation model, and the target style may be determined according to the selection operation in response to the selection operation of the image style selection interface by the user.

For example, server 120 may transmit to client 110 various styles that can be supported by the image generation model, and an image style selection interface may be generated by client 110. For example, the image style selection interface may use a style that can be supported by the expression image generation model of a single box group, and each single box of the single box group corresponds to a style that can be supported by the image generation model. And if the user clicks one of the single boxes of the image style selection interface, taking the style corresponding to the single box as the target style. The client 110 may transmit the original image and the target style to the server 120.

In the present disclosure, the style of the original image may or may not belong to the style supported by the image generation model.

In step S20-1, corresponding target style vectors are determined in the image generation model according to the target styles.

In the present disclosure, the style corresponding to the target style vector is the same as the style corresponding to the target image.

Taking the image generation model as an example, the image generation model includes a two-dimensional style vector, a sketch style vector, a oiler style vector, or a sticker style vector. If a two-dimensional style is selected as the style corresponding to the target image in step S10-1, a style vector of the two-dimensional style in the image generation model is used as the target style vector.

In step S30-1, the original image and the target style vector are input to an image generation network in the image generation model, and a target image having the target style is obtained.

The target image and the original image have the same image content, image details, image textures and image contours, which can be kept basically unchanged, but the style of the target image is the same as the acquired target style in step S10-1. Taking the image generation model as an example, the original image is a da vinci drawn image, and the image generation style determined in step S10-1 is a two-dimensional style, the target image is a two-dimensional style image of Mona Lisa.

The target image is obtained by the image generation model, the image generation model is trained based on the countermeasure idea, and after the image generation model is trained, the discriminator has difficulty in discriminating the authenticity of the image generated by the image generation model, so that the image generated by the image generation model is close to false, that is, the image generated by the image generation model can be used for carrying out style conversion on the image, and meanwhile, the image details, the image textures and the image contours of the original image can be reserved in the target image to a greater extent, and a better application effect is obtained in the actual application process.

Referring to fig. 9, fig. 9 is a schematic diagram of an image generating method according to an exemplary embodiment, for an input image of any one style, one style (target style) may be selected from image styles supported by an image generating model, and the selected style may be used as a style of a corresponding target image, a target style vector may be determined in the image generating model according to the target style, and the input image and the target style vector may be input together into an image generating network in the image generating model, so as to obtain a target image output by the image generating network, where the target image has the target style.

If the image generation model supports the two-dimensional style, the sketch style, the oil painting style and the sticker style, inputting the image of the traditional Chinese painting style into the image generation model, and selecting the two-dimensional style as the image generation style, so that a target image of the two-dimensional style can be obtained; if the image generation model supports the two-dimensional style, the sketch style, the oil painting style and the sticker style, inputting the traditional Chinese painting style image into the image generation model, and selecting the two-dimensional style as the image generation style, so that a target image of the two-dimensional style can be obtained; selecting a sketch style as an image generation style, and obtaining a target image of the sketch style; selecting the two-oil painting style as an image generation style, and obtaining an oil painting style target image; and selecting the sticker style as an image generation style to obtain a target image of the sticker style.

The embodiment of the disclosure can generate the target image with the target style for the original image with any style based on the image generation model, and the image generation model supports multiple image styles, so that the image generation styles can be multiple, namely, multiple target images with corresponding target styles can be obtained based on a single image.

FIG. 10 is a block diagram illustrating an image generation model training apparatus, according to an example embodiment. Referring to fig. 10, the apparatus includes:

a sample image set determination module 10 configured to perform acquiring at least two sample image sets, each of the sample image sets having a different image style;

a training object acquisition module 20 configured to perform acquisition of an image generation model to be trained including an image generation network and a style vector corresponding to each of the image styles, and a discriminator corresponding to each of the image styles; the style vector is used for triggering the image generation network to generate a stylized image with a corresponding image style;

an image generation module 30 configured to perform inputting of a sample image in each sample image set and each of the style vectors into the image generation network, to obtain a stylized image corresponding to each of the style vectors;

A first loss calculation module 40 configured to input each of the stylized images into a corresponding discriminator, and obtain a first discriminator loss and a second discriminator loss generated by the discriminators; the first discriminator loss characterizes a discrimination accuracy loss of the discriminator, and the second discriminator loss characterizes an image generation accuracy loss of the image generation network with respect to the discriminator;

a second loss calculation module 50 configured to obtain an image generation loss generated by the image generation network based on the sample image, each of the stylized images, and the second discriminator loss corresponding to each of the discriminators;

a training module 60 configured to perform a corresponding arbiter based on the first arbiter penalty training, and to train the image generation network and each of the style vectors based on the image generation penalty training; and determining the trained image generation network and the trained style vectors as the image generation model.

In an exemplary embodiment, the first loss calculation module is configured to input each stylized image into a corresponding discriminator to obtain a discrimination result generated by the discriminator; the discriminator corresponds to a style vector corresponding to the stylized image; and calculating the first discriminant loss according to the difference between the discrimination result and the true value of the stylized image.

In an exemplary embodiment, the second loss calculation module includes:

a high-frequency loss calculation unit configured to perform calculation of a high-frequency loss from high-frequency components of any two stylized images; any two stylized images are generated based on the same sample image;

a consistency loss calculation unit configured to perform calculation of a consistency loss based on a pixel difference between the target stylized image and the sample image;

an image generation loss calculation unit configured to obtain the image generation loss based on the high-frequency loss, the coincidence loss, and the second discriminator loss corresponding to each of the discriminators.

In an exemplary embodiment, the apparatus further includes a training control module configured to perform training to obtain the image generation model when the image generation loss is less than a preset loss threshold; or when the training times of the training target are larger than a preset training times threshold value, obtaining the image generation model, wherein the training target comprises one or more of the discriminator, the image generation network or the style vector.

Fig. 11 is a block diagram of an image generating apparatus according to an exemplary embodiment. Referring to fig. 11, the apparatus includes:

a generating element acquisition module 10-1 configured to perform acquisition of an original image and a target style representing a style corresponding to a target image to be generated;

a target style vector acquisition module 20-1 configured to perform determining a corresponding target style vector in the image generation model according to the target style described above;

a target image acquisition module 30-1 configured to perform an image generation network that inputs the original image and the target style vector into the image generation model, to obtain a target image having the target style;

The image generation model is obtained according to the training method of the image generation model in the method embodiment.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

In an exemplary embodiment, there is also provided an electronic device including a processor; a memory for storing processor-executable instructions; wherein the processor is configured to implement the training method of the image generation model or the steps of the image generation method in the above embodiments when executing the instructions stored on the memory.

The electronic device may be a terminal, a server, or a similar computing device, taking the electronic device as an example of a server, fig. 12 is a block diagram of the electronic device of the training method of the image generation model or the image generation method shown in accordance with an exemplary embodiment, where the electronic device 1000 may be relatively different due to configuration or performance, and may include one or more central processing units (Central Processing Units, CPU) 1010 (the processor 1010 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), a memory 1030 for storing data, one or more storage media 1020 (such as one or more mass storage devices) for storing application 1023 or data 1022. Wherein the memory 1030 and storage medium 1020 can be transitory or persistent storage. The program stored on the storage medium 1020 may include one or more modules, each of which may include a series of instruction operations in the electronic device. Still further, the central processor 1010 may be configured to communicate with a storage medium 1020 and execute a series of instruction operations in the storage medium 1020 on the electronic device 1000. The electronic device 1000 can also include one or more power supplies 1060, one or more wired or wireless network interfaces 1050, one or more input/output interfaces 1040, and/or one or more operating systems 1021, such as Windows ServerTM, mac OS XTM, unixTM, linuxTM, freeBSDTM, and the like.

Input-output interface 1040 may be used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the electronic device 1000. In one example, input-output interface 1040 includes a network adapter (Network Interface Controller, NIC) that may be connected to other network devices via base stations to communicate with the internet. In an exemplary embodiment, the input/output interface 100 may be a Radio Frequency (RF) module for communicating with the internet in a wireless manner.

It will be appreciated by those of ordinary skill in the art that the configuration shown in fig. 12 is merely illustrative and is not intended to limit the configuration of the electronic device described above. For example, electronic device 1000 may also include more or fewer components than shown in FIG. 12 or have a different configuration than shown in FIG. 12.

In an exemplary embodiment, there is also provided a storage medium, which when executed by a processor of an electronic device, enables the electronic device to perform the training method or the image generation method of the image generation model provided in any one of the above embodiments.

In an exemplary embodiment, a computer program product is also provided, the computer program product comprising computer instructions stored in a computer readable storage medium. The processor of the electronic device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions so that the electronic device performs the training method or the image generation method of the image generation model provided in any one of the above embodiments.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method of training an image generation model, comprising:

2. The method of training an image generation model according to claim 1, wherein inputting each of the stylized images into a corresponding arbiter results in a first arbiter loss and a second arbiter loss generated by the arbiter, the first arbiter loss characterizing a loss of discrimination accuracy of the arbiter, the second arbiter loss characterizing a loss of image generation accuracy of the image generation network with respect to the arbiter, comprising:

3. The method of training an image generation model according to claim 2, wherein the inputting each of the stylized images into a corresponding arbiter results in a first arbiter loss and a second arbiter loss generated by the arbiter, the first arbiter loss characterizing a loss of discrimination accuracy of the arbiter, the second arbiter loss characterizing a loss of image generation accuracy of the image generation network with respect to the arbiter, further comprising:

4. A training method of an image generation model according to claim 3, wherein said obtaining an image generation loss generated by said image generation network based on said sample image, each of said stylized images, and said second discriminator loss corresponding to each of said discriminators comprises:

5. The training method of an image generation model according to any one of claims 1 to 4, characterized in that the method further comprises:

or alternatively, the first and second heat exchangers may be,

6. An image generation method, the method comprising:

the image generation model is trained according to the training method of the image generation model according to any one of claims 1 to 5.

7. An image generation model training apparatus, comprising:

8. The image generation model training apparatus of claim 7 wherein:

the first loss calculation module is configured to execute inputting each stylized image into a corresponding discriminator to obtain a discrimination result generated by the discriminator; the discriminator corresponds to a style vector corresponding to the stylized image; and calculating the first discriminator loss according to the difference value between the discrimination result and the true value of the stylized image.

9. The image generation model training apparatus of claim 8 wherein:

the first loss calculation module is configured to calculate the second discriminator loss according to a difference between the discrimination result and an expected value of the stylized image.

10. The image generation model training apparatus of claim 9, wherein the second loss calculation module comprises:

11. The image generation model training apparatus of any of claims 7-10, further comprising a training control module configured to perform training to obtain the image generation model when the image generation loss is less than a preset loss threshold; or when the training times of the training targets are larger than a preset training times threshold value, obtaining the image generation model, wherein the training targets comprise one or more of the discriminator, the image generation network or the style vector.

12. An image generation apparatus, the apparatus comprising:

13. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the training method of the image generation model of any one of claims 1 to 5 or the image generation method of claim 6.

14. A computer readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the training method of an image generation model of any one of claims 1 to 5 or the image generation method of claim 6.

15. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the training method of an image generation model according to any one of claims 1 to 5 or the image generation method according to claim 6.