WO2022088878A1

WO2022088878A1 - Style image generation method, model training method and apparatus, and device and medium

Info

Publication number: WO2022088878A1
Application number: PCT/CN2021/114211
Authority: WO
Inventors: 尹淳骥; 张耀; 李文越
Original assignee: 北京字节跳动网络技术有限公司
Priority date: 2020-10-30
Filing date: 2021-08-24
Publication date: 2022-05-05
Also published as: CN112991148B; CN112991148A

Abstract

A style image generation method, a model training method and apparatus, and a device and a storage medium. The method comprises: firstly, receiving a first style image; and then, after performing style conversion processing on the first style image, generating a second style image. By means of the method, a hidden space of an image generation model is constrained during a training process of the image generation model, such that the image generation model applied to the style image generation method can generate a style image with a relatively good quality and result, thereby improving the user experience.

Description

Style image generation method, model training method, device, equipment and medium

This disclosure claims the priority of the Chinese patent application with the application number 202011197824.4 and the invention titled "Style Image Generation Method, Model Training Method, Apparatus, Equipment and Medium" filed with the China Patent Office on October 30, 2020, all of which The contents are incorporated by reference in this disclosure.

technical field

The present disclosure relates to the technical field of data processing, and in particular, to a style image generation method, a model training method, an apparatus, a device and a storage medium.

Background technique

With the continuous development of image processing technology, the image style conversion function has become a new interesting game in the field of image application. Specifically, image style conversion refers to converting an image from one style to another style that meets user needs.

At present, machine models have become a common method in image processing. For example, image style conversion can be realized based on machine models. However, how to improve the quality of style images obtained by image style conversion based on machine models is a technical problem that needs to be solved urgently. .

SUMMARY OF THE INVENTION

In order to solve the above-mentioned technical problems or at least partially solve the above-mentioned technical problems, the present disclosure provides a style image generation method, a model training method, an apparatus, a device and a storage medium, which can generate high-quality style images and improve user experience. .

In a first aspect, the present disclosure provides a style image generation method executed by an image generation model, the method comprising:

receiving a first style image;

After performing style conversion processing on the first style image, a second style image is generated;

The image generation model is obtained by training based on the first style image samples and the second style image samples having a corresponding relationship, and the latent space of the image generation model is constrained during the training process.

In an optional embodiment, the latent space of the image generation model is constrained to be a vector dictionary containing a preset number of latent vectors; after performing the style conversion process on the first style image, a second style image is generated. images, including:

extracting the feature vector of the first style image;

From the vector dictionary, determine the latent vector with the smallest distance from the feature vector as the target vector corresponding to the feature vector;

Based on the target vector, a second style image corresponding to the first style image is generated.

In an optional implementation manner, the latent space of the image generation model is constrained to be a normal distribution; after performing style conversion processing on the first style image, generating a second style image, including:

mapping the first style image to the normal distribution to obtain a feature vector of the first style image;

Based on the feature vector, a second style image corresponding to the first style image is generated.

In an optional implementation manner, before generating the second style image corresponding to the first style image based on the feature vector, the method further includes:

The eigenvector is updated based on the target weight coefficient to obtain the updated eigenvector; the target weight coefficient is used to represent the distance between the eigenvector and the origin of the normal distribution;

Correspondingly, generating a second style image corresponding to the first style image based on the feature vector includes:

Based on the updated feature vector, a second style image corresponding to the first style image is generated.

In an optional embodiment, before the feature vector is updated based on the target weight coefficient to obtain the updated feature vector, the method further includes:

The target weight coefficient is acquired in response to an input operation of the target weight coefficient.

In an optional implementation manner, the first style image includes a line art style image, and the second style image includes a comic style image.

In a second aspect, the present disclosure provides a training method for an image generation model, the method comprising:

obtaining image samples of the first style and image samples of the second style that have a corresponding relationship;

In the process of training based on the first style image samples and the second style image samples having the corresponding relationship, the latent space is constrained to obtain a trained image generation model.

In an optional embodiment, in the process of performing training based on the first style image sample and the second style image sample having a corresponding relationship, the latent space is constrained to obtain a trained image generation model, including:

extracting the feature vector of the first style image sample;

From the vector dictionary, determine the hidden vector with the smallest distance from the feature vector as the target vector corresponding to the feature vector; the vector dictionary stores a preset number of hidden vectors;

based on the target vector, generating a first output image corresponding to the first style image sample;

Inputting the second style image sample and the first output image corresponding to the first style image sample into the discriminator, and after processing by the discriminator, a loss value is obtained;

The vector dictionary is updated based on the loss value, and the next round of iterative training is entered until a preset convergence condition is reached, and a trained image generation model is obtained.

In an optional implementation manner, before the extracting the feature vector of the first style image sample, the method further includes:

Perform image enhancement processing on the first style image sample based on the target image enhancement method to obtain an enhanced image sample;

Correspondingly, the extracting the feature vector of the first style image sample includes:

Feature vectors of the enhanced image samples are extracted.

In an optional embodiment, in the process of training based on the first style image sample and the second style image sample with the corresponding relationship, the latent space is constrained to obtain a trained image generation model, include:

mapping the first style image sample to the current vector distribution to obtain the feature vector of the first style image sample;

Using the maximum average difference algorithm, the current vector distribution is updated based on the standard normal distribution and the eigenvector;

generating a first output image corresponding to the first style image sample based on the feature vector;

Input the second style image sample and the first output image corresponding to the first style image sample into the discriminator, implement this round of iterative training, and enter the next round of iterative training until the preset convergence is reached condition to get the trained image generation model.

In an optional implementation manner, before the mapping of the first style image sample to the current vector distribution to obtain the feature vector of the first style image sample, the method further includes:

Correspondingly, the first style image sample is mapped to the current vector distribution to obtain the feature vector of the first style image sample, including:

The enhanced image sample is mapped to the current vector distribution to obtain the feature vector of the first style image sample.

In a third aspect, the present disclosure provides a style image generation apparatus, the apparatus is applied to an image generation model, and the apparatus includes:

a receiving module for receiving the first style image;

a generating module, configured to generate a second style image after performing style conversion processing on the first style image;

In a fourth aspect, the present disclosure provides an apparatus for training an image generation model, the apparatus comprising:

an acquisition module, configured to acquire image samples of the first style and image samples of the second style that have a corresponding relationship;

The constraint module is configured to constrain the latent space in the process of training based on the first style image sample and the second style image sample having the corresponding relationship to obtain a trained image generation model.

In a fifth aspect, the present disclosure provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are executed on a terminal device, the terminal device is made to implement the above method.

In a sixth aspect, the present disclosure provides a device comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor, when the processor executes the computer program, Implement the above method.

Compared with the prior art, the technical solutions provided by the embodiments of the present disclosure have the following advantages:

An embodiment of the present disclosure provides a style image generation method. First, an image generation model receives a first style image, and then, after performing style conversion processing on the first style image, a second style image is generated. The embodiments of the present disclosure constrain the latent space of the image generation model during the training of the image generation model, so that the image generation model can be used to generate style images with better quality and effect, thereby improving user experience.

Description of drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the accompanying drawings that are required to be used in the description of the embodiments or the prior art will be briefly introduced below. In other words, on the premise of no creative labor, other drawings can also be obtained from these drawings.

FIG. 1 is an application environment architecture diagram of a style image generation method provided by an embodiment of the present disclosure;

FIG. 2 is a flowchart of a method for generating a style image according to an embodiment of the present disclosure;

3 is a schematic diagram of the effect of implementing image style conversion based on an image generation model according to an embodiment of the present disclosure;

4 is a schematic diagram of a training process of an image generation model according to an embodiment of the present disclosure;

5 is a flowchart of a training method for an image generation model provided by an embodiment of the present disclosure;

6 is a schematic diagram of a model training process for constraining the latent space of an image generation model according to an embodiment of the present disclosure;

7 is a flowchart of a model training method for constraining the latent space of an image generation model according to an embodiment of the present disclosure;

FIG. 8 is an image enhancement effect diagram provided by an embodiment of the present disclosure;

9 is a flowchart of a method for generating a style image according to an embodiment of the present disclosure;

FIG. 10 is a schematic diagram of another model training process for constraining the latent space of an image generation model according to an embodiment of the present disclosure;

11 is a flowchart of another model training method for constraining the latent space of an image generation model according to an embodiment of the present disclosure;

12 is a flowchart of another style image generation method provided by an embodiment of the present disclosure;

FIG. 13 is a schematic structural diagram of a style image generating apparatus according to an embodiment of the present disclosure;

14 is a schematic structural diagram of an apparatus for training an image generation model according to an embodiment of the present disclosure;

FIG. 15 is a schematic structural diagram of a style image generating device according to an embodiment of the present disclosure;

FIG. 16 is a schematic structural diagram of a training device for an image generation model according to an embodiment of the present disclosure.

Detailed ways

In order to more clearly understand the above objects, features and advantages of the present disclosure, the solutions of the present disclosure will be further described below. It should be noted that the embodiments of the present disclosure and the features in the embodiments may be combined with each other under the condition of no conflict.

Many specific details are set forth in the following description to facilitate a full understanding of the present disclosure, but the present disclosure can also be implemented in other ways different from those described herein; obviously, the embodiments in the specification are only a part of the embodiments of the present disclosure, and Not all examples.

The present disclosure provides a style image generation method, the style image generation method is performed by an image generation model, and the image generation model constrains the latent space in the training process, therefore, the image generation model receives a first style image to After the style conversion process is performed, the quality and effect of the generated second style image can be guaranteed, thereby improving user experience.

Before the specific introduction of the style image generation method provided by the present disclosure, the application environment of the style image generation method provided by the present disclosure is first introduced.

Referring to FIG. 1, an application environment architecture diagram of a style image generation method provided by an embodiment of the present disclosure, wherein the style image generation method provided by the embodiment of the present disclosure can be applied to the server 102. Specifically, the terminal 101 obtains the After the first style image, the first style image is sent to the server 102 through the network connection between the terminal 101 and the server 102, and the image generation model deployed on the server 102 performs style conversion processing on the first style image to generate Second style image. The terminal 101 can be a desktop computer, a mobile terminal (such as a smart phone, smart glasses, tablet computer, laptop computer, wearable electronic device, smart home device, etc.), and the server 102 can be an independent server or multiple servers A cluster of servers is implemented.

In practical applications, after acquiring the first style image, the terminal 101 sends it to the server 102 . After receiving the first style image, the server 102 performs style conversion processing to generate a second style image. The image generation model is obtained through training by the server 102 based on the first style image samples and the second style image samples having the corresponding relationship, and the latent space of the image generation model is constrained during the training process.

In another application environment, the style image generation method provided by the embodiment of the present disclosure can be directly applied to the terminal 101. Specifically, after receiving the first style image, the terminal 101 sends the first style image to the image generation model , after the image generation model performs style conversion processing, a second style image is generated. The image generation model is obtained by the terminal 101 through training based on the first style image samples and the second style image samples having the corresponding relationship, and the latent space of the image generation model is constrained during the training process.

On the basis of the above application environment, the style image generation method provided by the present disclosure will be specifically introduced below.

Specifically, an embodiment of the present disclosure provides a method for generating a style image. Referring to FIG. 2 , a flowchart of a method for generating a style image provided by an embodiment of the present disclosure is provided. The method can be applied to an image generation model, and the method includes: :

S201: Receive a first style image.

In the embodiment of the present disclosure, the first style image may be an image of any style, such as a line art style image, a two-dimensional style image, a sketch style image, an oil painting style image, a cartoon style image, a real style image, a comic style image, etc. .

In practical applications, the first style image may be an image captured in real time by a camera invoked by the terminal, an image drawn in real time by a user on a terminal interface, or an image obtained from an album of the terminal. This disclosure does not limit this.

In an optional embodiment, the first style image may include a face, for example, the first style image may be a face image, and the style of the face image can be converted based on the style image generation method provided by the embodiment of the present disclosure, such as Convert line art face images to manga face images.

S202: After performing style conversion processing on the first style image, a second style image is generated.

The image generation model is obtained by training based on the image samples of the first style and the image samples of the second style that have a corresponding relationship, and the latent space of the image generation model is constrained during the training process.

In the embodiment of the present disclosure, the image generation model is used to convert the first style image into the second style image. Among them, the first style and the second style belong to two different styles of the image. Specifically, the style of the image may include a line art style, a two-dimensional style, a sketch style, an oil painting style, a cartoon style, a real style, and a comic style. The image generation model provided by the embodiments of the present disclosure is used to convert an image from one style to another.

Referring to FIG. 3 , it is a schematic diagram of the effect of implementing image style conversion based on an image generation model according to an embodiment of the present disclosure. Taking the first style image as a line art style image and the second style image as a comic style image as an example, after the first style image is input into the image generation model, after the image generation model is processed, a second style image corresponding to the first style image is obtained. Style image, to achieve the effect of converting a line art style image to a comic style image.

In practical applications, before applying the image generation model, the image generation model is first trained. In the embodiment of the present disclosure, the image generation model is trained based on the first style image samples and the second style image samples with the corresponding relationship, and the trained image generation model is obtained, which is used to convert the style of the image.

In the process of training the image generation model, the embodiment of the present disclosure constrains the latent space of the image generation model, so that the latent space constrained in multiple rounds of iterative training based on high-quality image samples can be In the process of applying the generative model, the input feature vector of the first style image is better constrained, and finally a second style image with higher quality is generated, which improves the user experience.

Among them, constraining the latent space refers to constraining the latent code in the latent space. By constraining the latent vector, the feature vector input to the decoder of the image generation model in the model application stage is constrained. Decoding by the decoder results in a higher quality second style image.

In the style image generation method provided by the embodiment of the present disclosure, by constraining the latent space of the image generation model during the training process of the image generation model, in the application stage of the image generation model, a style image with higher quality and effect can be generated , to improve the user experience.

In order to facilitate the understanding of the solution, before introducing the subsequent style image generation method, the embodiment of the present disclosure first introduces the training method of the image generation model. 4 is a schematic diagram of a training process of an image generation model provided by an embodiment of the present disclosure, wherein the image generation model to be trained may be a conditional confrontation generation network CGAN, and the conditional confrontation generation network includes a generator and a discriminator. The conditional adversarial generative network is trained to obtain the trained image generative model.

In the actual training process, the first style image sample and the second style image sample with the corresponding relationship are first obtained, the output image is obtained by inputting the first style image sample into the generator, and then the output image sum of the generator is combined with the first style image sample. The style image samples have corresponding second style image samples, and are input to the discriminator at the same time, and the parameters in the generator are adjusted based on the loss value output by the discriminator, so as to complete the current round of iterative training for the generator. According to the above training method, based on a large number of high-quality first-style image samples and second-style image samples with corresponding relationships, the discriminator is used to perform multiple rounds of training on the generator until the preset convergence condition is reached. The post image generation model.

In an optional implementation manner, the second style image sample may be drawn by a professional based on the first style image sample, thus, the first style sample image and the second sample image have a corresponding relationship, and are drawn by a professional. The quality of the image samples can also be guaranteed, thereby ensuring that the image generation model trained based on the image samples can generate high-quality style images.

Referring to FIG. 5 , a flowchart of an image generation model training method provided in an embodiment of the present disclosure, wherein the method can be applied to an image generation model to be trained, and the method includes:

S501: Acquire a first style image sample and a second style image sample having a corresponding relationship.

Wherein, the first style image sample and the second style image sample may be two different types of images in a line art style, a two-dimensional style image, a sketch style image, an oil painting style image, a cartoon style image, a real style image, a comic style image, etc. style image. For example, the first style graphic sample includes a lineart style image, and the second style graphic sample includes a manga style image.

S502: In the process of training based on the first style image sample and the second style image sample having the corresponding relationship, constrain the latent space of the image generation model to obtain a trained image generation model.

In the embodiment of the present disclosure, in the process of training the image generation model, the latent space of the image generation model is continuously constrained in each round of iterative training, and finally an image generation model with constrained latent space is obtained, which can be used for Image style conversion.

In order to further introduce the training process of the image generation model, the embodiments of the present disclosure provide the following two model training methods for constraining the latent space of the image generation model.

Referring to FIG. 6 , a schematic diagram of a model training process for constraining the latent space of an image generation model provided by an embodiment of the present disclosure, wherein the generator of the image generation model includes an encoder and a decoder. The adjustment of the parameters in the encoder and decoder realizes the training of the generator, and obtains the trained image generation model.

In the actual training process, the first style image sample is input to the encoder in the generator, the feature vector of the first style image sample is extracted by the encoder, and then the hidden vector with the smallest distance from the feature vector is determined from the vector dictionary. latent vector, as the target vector corresponding to the feature vector. Among them, the vector dictionary is used to store a preset number of hidden vectors. After the target vector corresponding to the feature vector is determined, the target vector is input into the decoder, and the decoder decodes to obtain the first output image corresponding to the first style image sample, and combines the first output image and the first output image with the first style image sample. The second style image sample corresponding to the style image sample is simultaneously input into the discriminator, and after being processed by the discriminator, the loss value is obtained. The vector dictionary is then updated based on the loss value, so that the latent vectors in the vector dictionary are continuously adjusted to vectors that can generate higher quality style images. According to the above training method, a trained image generation model is finally obtained through multiple rounds of iterative training based on high-quality image samples, which is used to convert the style of the image.

In the training method of the image generation model provided by the embodiment of the present disclosure, in the process of training the image generation model, the latent vector in the vector dictionary is continuously adjusted, so as to realize the constraint of the latent space in the image generation model, so that the hidden vector is The spatially constrained image generation model can generate high-quality style images and improve user experience.

Corresponding to the above training method, an embodiment of the present disclosure also provides a flowchart of a model training method for constraining the latent space of an image generation model. Referring to FIG. 7 , the method includes:

S701: Acquire a first style image sample and a second style image sample having a corresponding relationship.

S702: Extract the feature vector of the first style image sample.

In the embodiment of the present disclosure, when the first style image sample is a line art image sample, and the second style image sample is a comic image sample, after the first style image sample is input into the encoder of the image generation model to be trained, the encoding The feature vector extracted by the processor may be N (N is a positive integer, for example, 25) M-dimensional (M is a positive integer, for example, 64) feature vectors.

In order to improve the robustness of the image generation model, in this embodiment of the present disclosure, after the first style image is input into the image generation model to be trained, the image generation model to be trained first performs image processing on the first style image sample based on the target image enhancement method. The enhancement process is performed to obtain the enhanced image sample, and then the feature vector of the enhanced image sample is extracted. The target image enhancement method may be one of image enhancement methods such as random expansion, erosion, rotation, translation, scaling, and deformation.

In the embodiment of the present disclosure, during the training process of the image generation model, random image enhancement processing is performed on the image samples, so that the trained image generation model has lower requirements on the quality of the images input to the model. That is to say, better style transfer images can be obtained for images of different quality. For example, the requirements for the stroke thickness of the line art style image are low, and for the line art style images with different stroke thicknesses, a comic style image with better effect can be generated. As shown in FIG. 8 , a data enhancement effect diagram provided by an embodiment of the present disclosure, in which, based on two line art style images A and B of different qualities, a comic style image with better quality can be obtained.

In an optional embodiment, image enhancement processing is performed on the first style image sample based on a random deformation image enhancement method. Specifically, the preset image is divided into N×N squares of the same size, and the center of each square is Randomly determine an offset vector (dx, dy), then linearly spread the offset vector to the entire square, and control the offset at the border of the square to be 0, to obtain the offset field corresponding to the preset picture. Finally, image enhancement processing is performed on the first style image sample according to the above offset field to obtain an enhanced image sample.

S703: From the vector dictionary, determine the hidden vector with the smallest distance from the feature vector as the target vector corresponding to the feature vector; wherein, a preset number of hidden vectors is stored in the vector dictionary.

In the embodiment of the present disclosure, the vector dictionary may include 64*64 64-dimensional latent vectors, and after extracting N 64-dimensional feature vectors corresponding to the first style image sample, respectively obtain N 64-dimensional feature vectors from the vector dictionary The hidden vector with the smallest distance between each feature vector in the dimensional feature vector is used as the target vector of the corresponding feature vector.

S704: Based on the target vector, generate a first output image corresponding to the first style image sample.

In the embodiment of the present disclosure, after replacing the corresponding feature vector with the target vector, the replaced target vector is input into the decoder, and the decoder decodes to obtain the first output image corresponding to the first style image sample.

S705: Input the second style image sample and the first output image corresponding to the first style image sample into the discriminator, and obtain the loss value after being processed by the discriminator.

S706: Update the vector dictionary based on the loss value, and enter the next round of iterative training until a preset convergence condition is reached, and a trained image generation model is obtained.

In the embodiment of the present disclosure, in the process of model training, the training objective of the generator is to minimize the loss value, and the training objective of the discriminator is to maximize the loss value. In the confrontation training between the generator and the discriminator, by The parameters in the model are continuously adjusted so that the discriminator cannot identify the authenticity of the first output image output by the generator, and finally a trained image generation model is obtained.

In the embodiment of the present disclosure, the vector dictionary is used as a parameter in the model. During the confrontation training between the generator and the discriminator, it is necessary to continuously adjust the hidden vector in the vector dictionary, so that the final image generation model after training can generate quality Higher style image.

Based on the image generation model obtained by the model training method for constraining the latent space of the image generation model, the first style image is converted into the second style image. Specifically, referring to FIG. 9 , which is a flowchart of a method for generating a style image provided by an embodiment of the present disclosure, the method includes:

S901: Acquire a first style image.

S902: Extract the feature vector of the first style image; wherein, the latent space of the image generation model is constrained to be a vector dictionary including a preset number of latent vectors.

In the embodiment of the present disclosure, after the first style image is input to the encoder in the image generation model, the encoder extracts the feature vector of the first style image.

S903: From the vector dictionary, determine the hidden vector with the smallest distance from the feature vector as the target vector corresponding to the feature vector.

In this embodiment of the present disclosure, after the feature vector of the first style image is extracted, a latent vector with the smallest distance from the feature vector is determined from the trained vector dictionary, and the latent vector is used as the feature vector corresponding to the feature vector. target vector. Among them, the latent vector refers to the vector in the latent space.

In the embodiment of the present disclosure, after replacing the corresponding feature vector with the target vector, it is passed to the decoder in the image generation model.

S904: Based on the target vector, generate a second style image corresponding to the first style image.

In the embodiment of the present disclosure, the decoder in the image generation model decodes the target vector to obtain a second style image corresponding to the first style image, and the image generation model outputs the second style image.

In the embodiment of the present disclosure, by replacing the feature vector corresponding to the first style object with the latent vector with the closest distance to the vector dictionary after training, the image generation model can generate a style image with better quality and effect. Meet the user's image style conversion needs and improve the user's experience.

Referring to FIG. 10, another schematic diagram of a model training process for constraining the latent space of an image generation model provided by an embodiment of the present disclosure, wherein the image generation model to be trained may be a conditional confrontation generation network CGAN, and the conditional confrontation generation network includes: The generator and the discriminator train the conditional adversarial generation network to obtain the trained image generation model.

In the actual training process, the first style image sample is input to the encoder in the generator, and the first style image sample is mapped to the current vector distribution by the encoder, and the feature vector of the first style image sample is obtained to realize the feature vector. extraction. After obtaining the feature vector of the first style image sample, a vector corresponding to the feature vector is randomly determined from the standard normal distribution, and then the maximum average difference algorithm is used to calculate the difference between the vector and the feature vector, and the The difference is determined as a distribution loss value, and based on the distribution loss value, the current vector distribution in the encoder can be adjusted and updated, so that the current vector distribution is continuously adjusted to a vector that can be used to obtain a higher quality style image. Among them, the maximum mean difference algorithm is often used to measure the difference between two distributions.

In the embodiment of the present disclosure, after the feature vector is obtained, it is input to the decoder, and the decoder decodes to obtain the first output image, and then combines the second style image sample corresponding to the first style image sample with the first output image The image is input to the discriminator at the same time, and the loss value is obtained, and the parameters in the generator and the discriminator can be adjusted based on the loss value.

According to the above training method, a trained image generation model is finally obtained through multiple rounds of iterative training based on high-quality image samples, which is used to convert the style of the image.

In the training method of the image generation model provided by the embodiment of the present disclosure, in the process of training the image generation model, the current vector distribution in the encoder is continuously adjusted, so as to realize the transformation of the current vector distribution in the image generation model to the standard normal The process of distribution constraints achieves the purpose of constraining the latent space of the image generation model, so that the image generation model constrained by the latent space can generate high-quality style images and improve user experience.

Corresponding to the above training method, an embodiment of the present disclosure also provides a flowchart of another model training method for constraining the latent space of an image generation model. Referring to FIG. 11 , the method includes:

S1101: Acquire a first style image sample and a second style image sample having a corresponding relationship.

S1102: Map the first style image sample to the current vector distribution to obtain a feature vector of the first style image sample.

In the embodiment of the present disclosure, when the first style image sample is a line art image sample and the second style image sample is a comic image sample, after the first style image sample is input into the encoder, the feature vector extracted by the encoder It can be a feature vector of size P (P is a positive integer, such as 512) dimension.

In order to improve the robustness of the image generation model, in this embodiment of the present disclosure, after the first style image is input into the to-be-trained image generation model, image enhancement processing is first performed on the first style image sample based on the target image enhancement method. image samples, and then map the enhanced image samples to the current vector distribution to obtain the feature vector of the first style image sample. The target image enhancement method may be one of image enhancement methods such as random expansion, erosion, rotation, translation, scaling, and deformation.

In the embodiment of the present disclosure, during the training process of the image generation model, random image enhancement processing is performed on the image samples, so that the trained image generation model has lower requirements on the quality of the images input to the model. That is to say, the style transfer images with better effect can be obtained for images of different quality. For example, the requirements for the stroke thickness of the line art style image are low, and for the line art style images with different stroke thicknesses, a comic style image with better effect can be generated. As shown in Fig. 8, based on two line art style images of A and B with different qualities, a comic style image with better quality can be obtained.

S1103: Use the maximum average difference algorithm to update the current vector distribution based on the standard normal distribution and the eigenvectors.

In the embodiment of the present disclosure, after the feature vector of the first style image sample is obtained, a vector corresponding to the feature vector is randomly determined from the standard normal distribution, and then the maximum average difference algorithm is used to calculate the difference between the vector and the feature vector. The difference is determined as a distribution loss value. Based on the distribution loss value, the current vector distribution in the encoder can be adjusted and updated, so that the current vector distribution can be continuously adjusted to a vector that can be used to obtain higher quality style images. .

In a preferred embodiment, the image generation model can be trained based on multiple pairs of image samples with corresponding relationships at the same time, after the corresponding vectors are determined for the feature vectors of each first style image sample from the standard normal distribution , using the maximum average difference algorithm to calculate the difference between each vector and the corresponding eigenvector, and then determine the distribution loss value based on the difference. Based on the distribution loss value, the current vector distribution in the encoder can be adjusted and updated, improving the encoder. The update efficiency of the current vector distribution in .

S1104: Based on the feature vector, generate a first output image corresponding to the first style image sample.

S1105: Input the second style image sample and the first output image corresponding to the first style image sample into the discriminator, implement this round of iterative training, and enter the next round of iterative training, until a preset convergence condition is reached, Get the trained image generation model.

In the embodiment of the present disclosure, during the training process of the image generation model, the current vector distribution in the encoder is continuously adjusted, so that the image generation model obtained after training can generate style images with better quality and effect. Meet the user's image style conversion needs and improve the user's experience.

Based on the image generation model obtained by the model training method for constraining the latent space of the image generation model, the first style image is converted into the second style image. Specifically, referring to FIG. 12 , which is a flowchart of another style image generation method provided by an embodiment of the present disclosure, the method includes:

S1201: Acquire a first style image.

S1202: Map the first style image into a normal distribution to obtain a feature vector of the first style image; wherein, the latent space of the image generation model is constrained to be a normal distribution.

In the embodiment of the present disclosure, the first style image is input to the encoder of the image generation model, and the encoder maps the first style image to the normal distribution after training to obtain the feature vector of the first style image, and realizes the feature vector extraction.

Since the parameters of the encoder in the embodiment of the present disclosure are constrained to be normal distribution, the input image will be mapped to the vector corresponding to the origin position of the normal distribution with a high probability, and the style image generated based on the vector corresponding to the origin position Therefore, the embodiments of the present disclosure can be used to generate high-quality style images based on the feature vector mapped to the normal distribution after training, so as to improve user experience.

S1203: Based on the feature vector, generate a second style image corresponding to the first style image.

In the embodiment of the present disclosure, after the feature vector is obtained, the feature vector is input into the decoder of the image generation model, the second style image corresponding to the first style image is obtained after decoding by the decoder, and the image generation model outputs the first style image. Second style image.

In the embodiment of the present disclosure, by mapping the feature vector corresponding to the first style object based on the trained normal distribution, a style image with better quality and effect can be generated, which can better meet the user's image style conversion needs and improve the user's sense of style. experience.

In addition, since the distance between the feature vector obtained by mapping and the vector corresponding to the origin of the normal distribution can represent the aesthetics of the style image generated based on the feature vector, the aesthetics of the generated style image can be improved by adjusting the distance.

To this end, the embodiments of the present disclosure are provided with a target weight coefficient to represent the distance between the feature vector and the origin of the normal distribution. By adjusting the target weight coefficient, the distance can be changed, thereby affecting the aesthetics of the generated style image.

In an optional implementation manner, the image generation model may update the feature vector of the first style image based on the target weight coefficient to obtain the updated feature vector. Then, based on the updated feature vector, a second style image corresponding to the first style image is generated to change the aesthetics of the style image.

In an optional implementation manner, the target weight coefficient may be adjusted by the user to obtain a style image that meets the user's aesthetic requirements. Specifically, the image generation model receives the user's input operation on the target weight coefficient, obtains the target weight coefficient corresponding to the operation, and then updates the feature vector of the first style image based on the target weight coefficient to obtain the updated feature vector. Finally, based on the updated feature vector, a style image that meets the user's aesthetic requirements is generated.

Based on the same inventive concept as the above method embodiments, the present disclosure also provides a style image generating apparatus. Referring to FIG. 13 , a schematic structural diagram of a style image generating apparatus provided in an embodiment of the present disclosure, the apparatus is applied to images In generating the model, the device includes:

a receiving module 1301, configured to receive a first style image;

A generating module 1302, configured to generate a second style image after performing style conversion processing on the first style image;

In an optional embodiment, the latent space of the image generation model is constrained to a vector dictionary containing a preset number of latent vectors; the generation module 1302 includes:

an extraction submodule for extracting the feature vector of the first style image;

Determining submodule, for from the vector dictionary, determine the hidden vector with the minimum distance between the eigenvectors, as the target vector corresponding to the eigenvectors;

The first generating submodule is configured to generate a second style image corresponding to the first style image based on the target vector.

In an optional embodiment, the latent space of the image generation model is constrained to be a normal distribution; the generation module 1302 includes:

a mapping submodule for mapping the first style image to the normal distribution to obtain a feature vector of the first style image;

The second generating sub-module is configured to generate a second style image corresponding to the first style image based on the feature vector.

In an optional implementation manner, the generating module 1302 further includes:

an update sub-module for updating the feature vector based on the target weight coefficient to obtain the updated feature vector; the target weight coefficient is used to represent the distance between the feature vector and the origin of the normal distribution;

Correspondingly, the second generation submodule is specifically used for:

The obtaining sub-module is configured to obtain the target weight coefficient in response to the input operation of the target weight coefficient.

In the style image generation method provided by the embodiments of the present disclosure, by constraining the latent space of the image generation model during the training process of the image generation model, in the application stage of the image generation model, a style image with higher quality and effect can be generated , to improve the user experience.

Based on the same inventive concept as the above method embodiments, the present disclosure also provides an apparatus for training an image generation model. Referring to FIG. 14 , it is a schematic structural diagram of an apparatus for training an image generation model provided in an embodiment of the present disclosure. The device includes:

an acquisition module 1401, configured to acquire a first style image sample and a second style image sample with a corresponding relationship;

The constraint module 1402 is configured to constrain the latent space during the training process based on the first style image samples and the second style image samples having the corresponding relationship to obtain a trained image generation model.

In an optional implementation manner, the constraint module 1402 includes:

an extraction submodule for extracting the feature vector of the first style image sample;

A determination submodule is used to determine, from the vector dictionary, a latent vector with the smallest distance from the feature vector as a target vector corresponding to the feature vector; a preset number of hidden vectors is stored in the vector dictionary;

a first generating submodule, configured to generate a first output image corresponding to the first style image sample based on the target vector;

The first processing sub-module is used for inputting the second style image sample corresponding to the first style image sample and the first output image into the discriminator, and after being processed by the discriminator, the loss is obtained value;

The first update sub-module is configured to update the current vector dictionary based on the loss value, and enter the next round of iterative training until a preset convergence condition is reached, and a trained image generation model is obtained.

In an optional implementation manner, the constraint module 1402 further includes:

a first enhancement sub-module, configured to perform image enhancement processing on the first style image sample based on the target image enhancement method to obtain an enhanced image sample;

Correspondingly, the extraction submodule is specifically used for:

Feature vectors of the enhanced image samples are extracted.

In an optional implementation manner, the constraint module 1402 includes:

a mapping submodule, configured to map the first style image sample to the current vector distribution after inputting the first style image sample into the image generation model, to obtain a feature vector of the first style image sample;

The second update submodule is used to update the current vector distribution based on the standard normal distribution and the eigenvector using the maximum average difference algorithm;

a second generating submodule, configured to generate a first output image corresponding to the first style image sample based on the feature vector;

The training sub-module is used to input the second style image sample and the first output image corresponding to the first style image sample into the discriminator, realize this round of iterative training, and enter the next round of iterative training , until the preset convergence condition is reached, and the trained image generation model is obtained.

a second enhancement sub-module, configured to perform image enhancement processing on the first style image sample based on the target image enhancement method to obtain an enhanced image sample;

Correspondingly, the mapping submodule is specifically used for:

In addition to the above method and apparatus, embodiments of the present disclosure also provide a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are executed on a terminal device, the terminal device is made to implement the present invention. The style image generation method or the image generation model training method according to the disclosed embodiments is disclosed.

In addition, an embodiment of the present disclosure further provides a style image generating device, as shown in FIG. 15 , which may include:

A processor 1501, a memory 1502, an input device 1503 and an output device 1504. The number of processors 1501 in the style image generating device may be one or more, and one processor is taken as an example in FIG. 15 . In some embodiments of the present disclosure, the processor 1501, the memory 1502, the input device 1503, and the output device 1504 may be connected through a bus or other means, wherein the connection through a bus is taken as an example in FIG. 15 .

The memory 1502 can be used to store computer programs and modules, and the processor 1501 executes various functional applications and data processing of the style image generating apparatus by running the computer programs and modules stored in the memory 1502 . The memory 1502 may mainly include a stored program area and a stored data area, wherein the stored program area may store an operating system, an application program required for at least one function, and the like. Additionally, memory 1502 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. The input device 1503 may be used to receive input numerical or character information, and generate signal input related to user settings and function control of the style image generating apparatus.

Specifically in this embodiment, the processor 1501 loads the executable files corresponding to the processes of one or more computer programs into the memory 1502 according to the following instructions, and the processor 1501 executes the executable files stored in the memory 1502 A computer program, thereby realizing each step in the above-mentioned style image generation method.

In addition, an embodiment of the present disclosure also provides a training device for an image generation model, as shown in FIG. 16 , which may include:

A processor 1601, a memory 1602, an input device 1603, and an output device 1604. The number of processors 1601 in the image generation model training device may be one or more, and one processor is taken as an example in FIG. 16 . In some embodiments of the present disclosure, the processor 1601 , the memory 1602 , the input device 1603 and the output device 1604 may be connected by a bus or other means, wherein the connection by a bus is taken as an example in FIG. 16 .

The memory 1602 can be used to store computer programs and modules, and the processor 1601 executes various functional applications and data processing of the image generation model training device by running the computer programs and modules stored in the memory 1602 . The memory 1602 may mainly include a stored program area and a stored data area, wherein the stored program area may store an operating system, an application program required for at least one function, and the like. Additionally, memory 1602 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. The input device 1603 may be used to receive input numerical or character information, and to generate signal input related to user settings and functional control of the training device for the image generation model.

Specifically in this embodiment, the processor 1601 loads the executable files corresponding to the processes of one or more computer programs into the memory 1602 according to the following instructions, and the processor 1601 executes the executable files stored in the memory 1602 A computer program, thereby realizing each step in the above-mentioned training method of an image generation model.

It should be noted that, in this document, relational terms such as "first" and "second" are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these Any such actual relationship or sequence exists between entities or operations. Moreover, the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device comprising a list of elements includes not only those elements, but also includes not explicitly listed or other elements inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

The above descriptions are only specific embodiments of the present disclosure, so that those skilled in the art can understand or implement the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the present disclosure. Therefore, the present disclosure is not to be limited to the embodiments described herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

A style image generation method, executed by an image generation model, characterized in that the method comprises:

receiving a first style image;

After performing style conversion processing on the first style image, a second style image is generated;

The image generation model is obtained by training based on the first style image samples and the second style image samples having a corresponding relationship, and the latent space of the image generation model is constrained during the training process.
The method according to claim 1, wherein the latent space of the image generation model is constrained to a vector dictionary containing a preset number of latent vectors; after the style conversion process is performed on the first style image, Generate second style images, including:

extracting the feature vector of the first style image;

From the vector dictionary, determine the latent vector with the smallest distance from the feature vector as the target vector corresponding to the feature vector;

Based on the target vector, a second style image corresponding to the first style image is generated.
The method according to claim 1, wherein the latent space of the image generation model is constrained to be a normal distribution; after performing the style conversion process on the first style image, generating a second style image, comprising: :

mapping the first style image to the normal distribution to obtain a feature vector of the first style image;

Based on the feature vector, a second style image corresponding to the first style image is generated.
The method according to claim 3, characterized in that before generating the second style image corresponding to the first style image based on the feature vector, the method further comprises:

The eigenvector is updated based on the target weight coefficient to obtain the updated eigenvector; the target weight coefficient is used to represent the distance between the eigenvector and the origin of the normal distribution;

Correspondingly, generating a second style image corresponding to the first style image based on the feature vector includes:

Based on the updated feature vector, a second style image corresponding to the first style image is generated.
The method according to claim 4, wherein before the feature vector is updated based on the target weight coefficient to obtain the updated feature vector, the method further comprises:

The target weight coefficient is acquired in response to an input operation of the target weight coefficient.
The method according to any one of claims 1-5, wherein the first style image comprises a lineart style image, and the second style image comprises a comic style image.
A training method for an image generation model, characterized in that the method comprises:

obtaining image samples of the first style and image samples of the second style that have a corresponding relationship;

In the process of training based on the first style image samples and the second style image samples having the corresponding relationship, the latent space is constrained to obtain a trained image generation model.
The method according to claim 7, wherein, in the process of training based on the first style image sample and the second style image sample with the corresponding relationship, the latent space is constrained to obtain the image generated after training models, including:

extracting the feature vector of the first style image sample;

From the vector dictionary, determine the minimum hidden vector with the distance of the feature vector, as the target vector corresponding to the feature vector; Store a preset number of hidden vectors in the vector dictionary;

based on the target vector, generating a first output image corresponding to the first style image sample;

Inputting the second style image sample and the first output image corresponding to the first style image sample into the discriminator, and after processing by the discriminator, a loss value is obtained;

The vector dictionary is updated based on the loss value, and the next round of iterative training is entered until a preset convergence condition is reached, and a trained image generation model is obtained.
The method according to claim 8, wherein before the extracting the feature vector of the first style image sample, the method further comprises:

Perform image enhancement processing on the first style image sample based on the target image enhancement method to obtain an enhanced image sample;

Correspondingly, the extracting the feature vector of the first style image sample includes:

Feature vectors of the enhanced image samples are extracted.
The method according to claim 7, wherein, in the process of training based on the first style image sample and the second style image sample with the corresponding relationship, the latent space is constrained to obtain the trained image Image generation models, including:

receiving the first style image sample, mapping the first style image sample to the current vector distribution, and obtaining a feature vector of the first style image sample;

Using the maximum average difference algorithm, the current vector distribution is updated based on the standard normal distribution and the eigenvector;

generating a first output image corresponding to the first style image sample based on the feature vector;

Input the second style image sample and the first output image corresponding to the first style image sample into the discriminator, realize this round of iterative training, and enter the next round of iterative training until the preset convergence is reached condition to get the trained image generation model.
The method according to claim 10, wherein before the mapping of the first style image samples into the current vector distribution to obtain the feature vector of the first style image samples, the method further comprises:

Perform image enhancement processing on the first style image sample based on the target image enhancement method to obtain an enhanced image sample;

Correspondingly, the first style image sample is mapped to the current vector distribution to obtain the feature vector of the first style image sample, including:

The enhanced image sample is mapped to the current vector distribution to obtain the feature vector of the first style image sample.
A style image generation device, characterized in that, the device is applied to an image generation model, and the device includes:

a receiving module for receiving the first style image;

a generating module, configured to generate a second style image after performing style conversion processing on the first style image;

Wherein, the image generation model is obtained by training based on the first style image samples and the second style image samples having the corresponding relationship, and the latent space of the image generation model is constrained during the training process.
An apparatus for training an image generation model, wherein the apparatus comprises:

an acquisition module, configured to acquire image samples of the first style and image samples of the second style that have a corresponding relationship;

The constraint module is configured to constrain the latent space in the process of training based on the first style image sample and the second style image sample with the corresponding relationship, and obtain a trained image generation model.
A computer-readable storage medium, characterized in that the computer-readable storage medium stores instructions, and when the instructions are executed on a terminal device, the terminal device is made to implement any one of claims 1-6. The method or implement the method according to any one of claims 7-11.
A device, characterized in that it comprises: a memory, a processor, and a computer program stored in the memory and running on the processor, when the processor executes the computer program, the computer program as claimed in the claims is realized. The method of any one of 1-6 or the implementation of the method of any one of claims 7-11.