CN113409342A

CN113409342A - Training method and device for image style migration model and electronic equipment

Info

Publication number: CN113409342A
Application number: CN202110519685.0A
Authority: CN
Inventors: 方慕园; 万鹏飞
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-05-12
Filing date: 2021-05-12
Publication date: 2021-09-17

Abstract

The application provides a training method and a device for an image style migration model and electronic equipment, wherein the method comprises the following steps: acquiring an original image and a real image; inputting an original image into a generator of an image style migration model to obtain a stylized image; extracting a first edge area of a target object in the stylized image and a second edge area of the target object in the real image, training a generator and a discriminator of the image style migration model according to the first edge area and the second edge area, and taking the trained generator as the image style migration model. In the application, the training process of the model can only take the edge area as input, so that the model can only process the characteristics of the edge area, and the calculation amount is reduced. In addition, the training process can focus on edge region fusion in the image style migration process, so that the change of the detail texture of the edge region can be easily captured, and the edge region repairing effect of the image stylization result is improved.

Description

Training method and device for image style migration model and electronic equipment

Technical Field

The embodiment of the application relates to the technical field of training of image style migration models, in particular to a training method and device of an image style migration model, a stylized image generation method and device, electronic equipment, a computer storage medium and a computer program product.

Background

The image stylization task can provide various stylized images for users, for example, in a portrait hair dyeing task, a hair area needs to be subjected to color conversion, and other areas can be kept unchanged. The task can divide different areas, respectively carry out style conversion and then carry out splicing, and the quality of the stylized image is influenced by the good and bad fusion effect of splicing positions.

In the related art, a generator in a Generative Adaptive Network (GAN) model can be used for stylizing to obtain a stylized image, the edge position of a stylized area is marked in the stylized image, then an edge fusion effect of the stylized area in the stylized image is identified by a discriminator of the GAN model, and the stylized processing effect provided by the generator is improved through game type training between the generator and the discriminator.

However, in the current scheme, the discriminator needs to process the whole image, which results in excessive consumption of computing resources. In addition, because the area ratio of the edge position in the whole image is too small, the discriminator is difficult to capture the difference of the edge position, and the training difficulty is large.

Disclosure of Invention

The embodiment of the application provides a training method and device for an image style migration model, a stylized image generation method and device, electronic equipment, a computer storage medium and a computer program product, and aims to solve the problem that in the related art, a discriminator needs to process a whole image, so that the consumption of computing resources is overlarge. In addition, because the area occupation ratio of the edge position in the whole image is too small, the discriminator is difficult to capture the difference of the edge position, and the problem of great training difficulty is caused.

In a first aspect, an embodiment of the present application provides a method for training an image style migration model, where the method includes:

acquiring an original image and a real image, wherein the original image and the real image both comprise a target object, and the style attribute of the target object contained in the original image is different from the style attribute of the target object contained in the real image;

inputting the original image into a generator of a neural network model to obtain a stylized image, wherein the stylized image is an image obtained by stylizing a target object in the original image; the neural network model further comprises a discriminator;

extracting a first edge area of the target object in the stylized image and a second edge area of the target object in the real image;

and training the generator and the discriminator according to the first edge area and the second edge area, and taking the trained generator as an image style migration model.

In an optional implementation, the training the generator and the discriminator according to the first edge region and the second edge region, and using the trained generator as an image style migration model includes:

performing image segmentation on the first edge area to obtain a plurality of first image blocks with preset sizes;

performing image segmentation on the second edge area to obtain a plurality of second image blocks with the preset size;

and training the generator and the discriminator according to the first image block and the second image block, and taking the trained generator as the image style migration model.

In an alternative embodiment, the training the generator and the discriminator based on the first image patch and the second image patch includes:

training the discriminator according to the first image block and the second image block;

and training the generator according to the first image block and the discriminator.

In an alternative embodiment, the training the discriminator according to the first image block and the second image block includes:

inputting the first image block and the second image block into the discriminator respectively to obtain a first judgment result corresponding to the first image block and a second judgment result corresponding to the second image block;

determining a first loss value according to a difference value between a first true value of the first image block and the first judgment result, and determining a second loss value according to a difference value between a true value of the second image block and the second judgment result; the first real value reflects the value of the input object judged by the discriminator to be false; the real value of the second image block reflects the value of the input object judged by the discriminator as true;

and alternately training the discriminator according to the first loss value and the second loss value.

In an alternative embodiment, the training the generator according to the first image block and the discriminator includes:

inputting the first image block into the discriminator to obtain a third judgment result corresponding to the first image block;

determining a third loss value according to a difference value between a second true value of the first image block and the third judgment result; the second real value reflects the value of the input object judged by the discriminator as true;

training the generator according to the third loss value.

and after the generator and the discriminator are subjected to iterative training operation for a preset number of times, terminating the training, and taking the trained generator as an image style migration model, wherein one iterative training operation comprises training the generator once and training the discriminator once.

In a second aspect, an embodiment of the present application provides a stylized image generating method, including:

inputting an image to be processed into an image style migration model to obtain a stylized image output by the image style migration model;

the image style migration model is obtained by training according to a training method of the image style migration model.

In a third aspect, an embodiment of the present application provides a device for training an image style migration model, where the device includes:

the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is configured to acquire an original image and a real image, the original image and the real image both contain a target object, and the style attribute of the target object contained in the original image is different from the style attribute of the target object contained in the real image;

the first style module is configured to input the original image into a generator of a neural network model to obtain a stylized image, wherein the stylized image is an image obtained by stylizing a target object in the original image; the neural network model further comprises a discriminator;

a segmentation module configured to extract a first edge region of the target object in the stylized image and a second edge region of the target object in the real image;

a training module configured to train the generator and the discriminator according to the first edge region and the second edge region, and use the trained generator as an image style migration model.

In an alternative embodiment, the training module comprises:

the first segmentation sub-module is configured to perform image segmentation on the first edge area to obtain a plurality of first image blocks with preset sizes;

the second segmentation sub-module is configured to perform image segmentation on the second edge area to obtain a plurality of second image blocks of the preset size;

a training sub-module configured to train the generator and the discriminator according to the first image patch and the second image patch, and use the trained generator as the image style migration model.

In an alternative embodiment, the training submodule includes:

a first training unit configured to train the discriminator according to the first image block and the second image block;

a second training unit configured to train the generator according to the first image block and the discriminator.

In an alternative embodiment, the first training unit comprises:

the first training subunit is configured to input the first image block and the second image block into the discriminator respectively to obtain a first determination result corresponding to the first image block and a second determination result corresponding to the second image block;

the second training subunit is configured to determine a first loss value according to a difference value between a first true value of the first image block and the first determination result, and determine a second loss value according to a difference value between a true value of the second image block and the second determination result; the first real value reflects the value of the input object judged by the discriminator to be false; the real value of the second image block reflects the value of the input object judged by the discriminator as true;

an alternating training subunit configured to alternately train the discriminator according to the first loss value and the second loss value.

In an alternative embodiment, the second training unit comprises:

a third training subunit, configured to input the first image block into the discriminator to obtain a third determination result corresponding to the first image block;

a fourth training subunit, configured to determine a third loss value according to a difference between a second true value of the first image block and the third determination result; the second real value reflects the value of the input object judged by the discriminator as true;

a training subunit configured to train the generator according to the third loss value.

In an alternative embodiment, the training module comprises:

and the iterative training submodule is configured to terminate training after performing iterative training operation on the generator and the discriminator for a preset number of times, and use the trained generator as an image style migration model, wherein one iterative training operation comprises training the generator once and training the discriminator once.

In a fourth aspect, an embodiment of the present application provides a stylized image generating apparatus, including:

the style processing module is configured to input an image to be processed into an image style migration model to obtain a stylized image output by the image style migration model;

the image style migration model is obtained according to a training device of the image style migration model.

In a fifth aspect, an embodiment of the present application further provides an electronic device, including a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement the image style migration model training and stylized image generation method.

In a sixth aspect, the present application further provides a computer storage medium, where instructions in the computer storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the method for training an image style migration model and generating a stylized image.

In a seventh aspect, an embodiment of the present application further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the method for training an image style migration model and generating a stylized image is implemented.

In the embodiment of the application, a first edge region of the target object in the stylized image and a second edge region of the target object in the real image can be extracted. Therefore, the subsequent training process can only take the edge area as input, and in the training process, the training process of the model can only take the edge area as input, so that the model can only process the characteristics of the edge area, and the calculation amount is reduced. In addition, the training process can focus on edge region fusion in the image style migration process, so that the change of the detail texture of the edge region can be easily captured, and the edge region repairing effect of the image stylization result is improved.

The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 is a flowchart illustrating steps of a method for training an image style migration model according to an embodiment of the present disclosure;

FIG. 2 is an architecture diagram of a generative confrontation network model provided by an embodiment of the present application;

FIG. 3 is a stylized image provided by an embodiment of the present application;

FIG. 4 is a flowchart illustrating steps of a method for generating a stylized image according to an exemplary embodiment of the present disclosure;

FIG. 5 is a flowchart illustrating steps of another method for training an image style migration model according to an embodiment of the present disclosure;

fig. 6 is a schematic edge region block diagram provided in an embodiment of the present application;

FIG. 7 is a block diagram of an apparatus for training an image style migration model according to an embodiment of the present disclosure;

FIG. 8 is a block diagram of a stylized image generation apparatus provided in an embodiment of the present application;

FIG. 9 is a logical block diagram of an electronic device of one embodiment of the present application;

fig. 10 is a logic block diagram of an electronic device of another embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Fig. 1 is a flowchart illustrating steps of a method for training an image style migration model according to an embodiment of the present application, where as shown in fig. 1, the method includes:

step 101, acquiring an original image and a real image.

The original image and the real image both comprise target objects, and the style attributes of the target objects contained in the original image are different from the style attributes of the target objects contained in the real image.

Referring to the training of a generative confrontation network model (GAN model), fig. 2 shows an architecture diagram of the generative confrontation network model provided in the embodiment of the present application, where the GAN model includes a generator and a discriminator, and the GAN model is a neural network model and can generate a preferred output through mutual game learning of the generator and the discriminator.

Specifically, the training data for training the image style migration model may include a plurality of real images and original images, where the original images and the real images both include target objects, and style attributes of the target objects included in the original images are different from style attributes of the target objects included in the real images.

For example, for the stylization processing for changing the hair color to pale yellow hair color, a plurality of portrait images of black hair may be prepared as original images, and a plurality of portrait images of natural pale yellow hair color may be prepared as real images.

Step 102, inputting the original image into a generator of a neural network model to obtain a stylized image, wherein the stylized image is an image obtained by stylizing a target object in the original image; the neural network model further comprises a discriminator.

Further, after the training data is prepared, the original image may be stylized by a generator in the GAN architecture to obtain a stylized image. For example, assuming that the generator is a model providing a stylization process for changing the hair color to a pale yellow color, the hair region in the stylized image obtained after the stylization process is added with a pale yellow color style feature.

Step 103, extracting a first edge region of the target object in the stylized image and a second edge region of the target object in the real image.

In order to further improve the fusion effect of the style region and other regions in the stylized image, in the process of training the generator and the discriminator under the GAN architecture, the method may focus on the edge region between the style region and other regions, and reduce the interference of the regions outside the edge region, referring to fig. 1, the embodiment of the present application may extract a first edge region of the target object in the stylized image and a second edge region of the target object in the real image. Therefore, the subsequent training process can only take the edge area as input, and in the training process, the training process of the model can only take the edge area as input, so that the model can only process the characteristics of the edge area, and the calculation amount is reduced. In addition, the training process can focus on edge region fusion in the image style migration process, so that the change of the detail texture of the edge region can be easily captured, and the edge region repairing effect of the image stylization result is improved.

For example, referring to fig. 3, which shows a stylized image provided in the embodiment of the present application, assuming that the region 10 is stylized in the stylized image, the region 10 may be a style region of a target object, and a first edge region 30 (a region covered by a dotted line) at a boundary between the region 10 and an adjacent region 20 is extracted. The same applies to the extraction of the second edge region of the real image, wherein if the style region of the target object is a hair region, the target region of the target object in the real image is also a hair region.

And 104, training the generator and the discriminator according to the first edge area and the second edge area, and taking the trained generator as an image style migration model.

In the embodiment of the application, at the initial stage of training, the parameters of the generator are not complete, so that the effect and quality of the stylized image processed by the generator are poor, and the fusion effect of the first edge area in the stylized image is poor, while the real image is the image which is collected in advance and accords with the ideal effect output by the generator, so that the stylized effect of the target area corresponding to the style area in the real image is ideal, and the fusion effect of the second edge area is good.

Therefore, in one training implementation, referring to fig. 1, in one iteration, a first edge region and a second edge region may be used as input, and training of parameters of the discriminator is performed with the purpose that the discriminator can recognize that the first edge region is "false" and the second edge region is "true", specifically, an output result of the discriminator includes two results, i.e., "true" and "false", where "true" refers to that the edge region is highly similar to an ideal edge region, and "false" refers to that the edge region is highly dissimilar to an ideal edge region. In the process of training the discriminator, a first edge region in the stylized image output by the generator can be considered as a virtual false region with poor fusion quality, and a second edge region in the real image can be considered as a real region with a natural fusion effect.

After the parameters of the discriminator are trained, the capability of the discriminator for discriminating the true and false of the edge region is possibly improved, the generator needs to be trained further subsequently, in the process of training the generator, the first edge region in the stylized image output by the generator needs to be capable of cheating the discriminator, and is specifically the true region which is identified as having a natural fusion effect by the discriminator, so that the training target of the generator is as follows: if the first edge region in the stylized image output by the generator can be judged to be true by the discriminator, the first edge region in the stylized image obtained by processing the original image by the generator can be input into the discriminator, a loss value is calculated by using an output value and a true value of the discriminator (at the moment, the true value is that the discriminator can identify the first edge region as true), parameters of the generator are trained by using the loss value and a loss function, and after training, the fusion quality of the first edge region in the stylized image generated by the generator can be improved.

And when the next round of iterative operation is started, the parameters of the generator and the discriminator are optimized, the parameters of the generator and the discriminator are further optimized after the next round of iterative operation is finished, the iteration is terminated until the preset iteration times are reached, and the final generator is used as an image style migration model.

It should be noted that in another implementation manner of training in the embodiment of the present application, the generator may be trained first, and then the arbiter may be trained, and the actual training sequence is not limited in the embodiment of the present application.

According to the embodiment of the application, mutual game learning between the generator and the discriminator in the GAN architecture is utilized, model parameters of the generator and the discriminator are perfected, and high-quality output is generated. After the image style migration model is established, the image style migration model can be embedded into an image processing flow, such as a mobile terminal, a server, a cloud and the like, so that a high-quality stylized processing function is provided, a user can use the image to be processed as the input of the image style migration model, stylized features are added to the image to be processed, and the stylized image with high stylized quality and fusion quality is obtained.

Optionally, step 104 may specifically include:

and a substep 1041 of terminating the training after performing iterative training operations for a preset number of times on the generator and the discriminator, and using the trained generator as an image style migration model, wherein one iterative training operation includes training the generator once and training the discriminator once.

In the embodiment of the application, each iterative training operation includes an operation of training the generator and the discriminator once respectively, so that parameters of the generator and the discriminator are optimized, in order to achieve the purpose of obtaining a better model parameter through training, multiple rounds of iterative training operations are required to enable the model parameter to reach an expected value, until the preset number of iterative training operations are reached, the model parameter is considered to reach the expected value, the iteration is terminated, and the final generator is used as an image style migration model.

In addition, the preset number of times that the number of iterative training operations finally reaches can be set according to actual requirements. In one mode, after a preset number of times is reached, calculating similarity between the stylized image output by the generator and the standard style image, and considering that the training index is finished under the condition that the similarity is greater than a set similarity threshold (such as 90%), so as to obtain a target generator; and under the condition that the similarity is smaller than the set similarity threshold, the training index is considered to be not finished, and at the moment, additional iterative training operation can be added until the similarity meets the index requirement. In another mode, after the preset number of times is reached, whether the training index is completed or not can be judged by determining whether the loss values of the generator and the discriminator are within a preset range or not.

In summary, according to the training method for the image style migration model provided in the embodiment of the present application, the first edge region of the target object in the stylized image and the second edge region of the target object in the real image may be extracted. Therefore, the subsequent training process can only take the edge area as input, and in the training process, the training process of the model can only take the edge area as input, so that the model can only process the characteristics of the edge area, and the calculation amount is reduced. In addition, the training process can focus on edge region fusion in the image style migration process, so that the change of the detail texture of the edge region can be easily captured, and the edge region repairing effect of the image stylization result is improved.

Fig. 4 is a flowchart illustrating steps of a stylized image generating method according to an embodiment of the present application, where as shown in fig. 4, the method may include:

step 201, inputting an image to be processed into an image style migration model, and obtaining a stylized image output by the image style migration model.

The image style migration model is obtained according to the training method of the image style migration model provided in fig. 1.

In the embodiment of the application, after the image style migration model is established, the image style migration model can be embedded into an image processing flow, such as a mobile terminal, a server, a cloud and the like, so as to provide a high-quality stylized processing function, so that a user can use the image to be processed as the input of the image style migration model, add stylized features to the image to be processed, and obtain a stylized image with high stylized quality and fusion quality.

Fig. 5 is a flowchart illustrating steps of another method for training an image style migration model according to an embodiment of the present application, as shown in fig. 5, including:

and 301, acquiring an original image and a real image.

The original image and the real image both comprise target objects, and the style attributes of the target objects contained in the original image are different from the style attributes of the target objects contained in the real image;

the implementation manner of this step is similar to the implementation process of step 101 described above, and this embodiment of the present application is not described in detail here.

Step 302, inputting the original image into a generator of a neural network model to obtain a stylized image, wherein the stylized image is an image obtained by stylizing a target object in the original image; the neural network model further comprises a discriminator.

The implementation manner of this step is similar to the implementation process of step 102 described above, and this embodiment of the present application is not described in detail here.

Step 303, extracting a first edge region of the target object in the stylized image and a second edge region of the target object in the real image.

The implementation manner of this step is similar to the implementation process of step 103 described above, and this embodiment of the present application is not described in detail here.

Optionally, step 303 may specifically include:

substep 3031, determining a first boundary line at the boundary between the style region and the adjacent region of the target object in the stylized image, and a second boundary line at the boundary between the target region and the adjacent region of the target object in the real image.

In the embodiment of the present application, before the edge region in the image is extracted, a first boundary line at a boundary between the style region and the adjacent region in the stylized image and a second boundary line at a boundary between the target region and the adjacent region in the real image may be determined. For example, referring to fig. 3, the first boundary line in the stylized image is the boundary line between the region 10 as the stylized region and the adjacent region 20.

Substep 3032, in the stylized image, dividing the first boundary line as a center line of the first edge region to obtain a first edge region with a preset width.

In the embodiment of the present application, after the first boundary line is determined, the first boundary line may be used as a center line of the first edge region, and the first edge region may be divided into two regions by extending a predetermined pixel distance in two directions away from the first boundary line.

Substep 3033, in the real image, dividing the second boundary line as a center line of the second edge region to obtain the second edge region with the preset width.

In the embodiment of the present application, after the second boundary line is determined, the second boundary line may be used as a center line of the second edge region, and the second edge region may be divided into two regions by extending the second boundary line in two directions away from the second boundary line by a predetermined pixel distance. The specific value of the preset pixel distance can be set according to actual requirements, and the embodiment of the application does not limit the specific value.

And step 304, performing image segmentation on the first edge area to obtain a plurality of first image blocks with preset sizes.

Step 305, performing image segmentation on the second edge area to obtain a plurality of second image blocks of the preset size.

In the embodiment of the present application, since the first edge area and the second edge area are relatively dispersed, long, narrow and irregular, and the rectangular image is processed by the discriminator with a better effect, the first edge area and the second edge area may be further subjected to image segmentation, and are segmented into a plurality of first image blocks and second image blocks with mxn pixels, each image block may form a rectangular image, so that the first image block and the second image block may be subsequently used as training inputs of the discriminator, thereby satisfying the purpose of processing the rectangular image by the discriminator. The specific values of the pixel sizes m × n of the first image block and the second image block may be set according to actual requirements, which is not limited in this embodiment of the application.

For example, referring to fig. 6, which illustrates an edge area blocking diagram provided in an embodiment of the present application, for a first edge area 30 in the stylized image illustrated in fig. 3, the first edge area may be divided into a plurality of rectangular first image blocks 31 through an image division operation, where the pixel size of each first image block 31 is the same.

Furthermore, after the first edge area and the second edge area are divided into smaller image blocks, the subsequent training process of the discriminator and the generator can focus on recognizing the fusion quality in the image blocks with finer granularity, and the training precision of the discriminator and the generator is further improved.

Step 306, training the generator and the discriminator according to the first image block and the second image block, and using the trained generator as the image style migration model.

In this embodiment of the present application, specifically, a first image block included in the first edge region and a second image block included in the second edge region are used as inputs, parameters of the discriminator are trained, then the first image block is used as an input of the discriminator, a loss value is calculated by using an output value and a true value of the discriminator (at this time, the true value is that the discriminator can recognize that the first edge region is "true"), and parameters of the generator are trained by using the loss value and the loss function. The edge area is divided into image blocks and then trained, and each image block can reflect part of fusion details in the edge area, so that capture of the fusion details of the edge area in the training process of the image style migration model can be further improved, and the training effect is further improved.

Optionally, in an implementation manner, step 306 may include:

substep 3061, training the discriminator according to the first image patch and the second image patch.

Substep 3062, training the generator according to the first image patch and the discriminator.

In an implementation manner, for one iteration operation, the discriminator may recognize that the first image block is "false" and the second image block is "true" as a training purpose, the first image block and the second image block are used for performing parameter training of the discriminator, then the first block is input into the discriminator with the improved capability after training, the first block is discriminated as "true" by the discriminator as a training purpose, the output value, the true value and the loss function of the improved discriminator are obtained by using the capability after training, the parameter of the generator is trained, and after training, the fusion quality of the first edge region in the stylized image generated by the generator may be improved.

After multiple iterations, both the generator and the arbiter can further approach the training index.

Optionally, in another implementation, step 306 may include:

substep 3063, training the generator according to the first image patch and the discriminator.

Substep 3064, training the discriminator according to the first image patch and the second image patch.

In another implementation manner, for one iteration operation, the first block may be input into the discriminator, the first block may be discriminated as true by the discriminator as a training purpose, the parameters of the generator are trained by using the output value, the true value, and the loss function of the discriminator, and after training, the fusion quality of the first edge region in the stylized image generated by the generator may be improved. And then, with the purposes that the discriminator can recognize that the first image block is false and the second image block is true, the parameters of the discriminator are trained by utilizing the first image block and the second image block, so that the capability of the discriminator for discriminating the true and false edge areas is improved.

Optionally, the training of the discriminator according to the first image block and the second image block may specifically be implemented by the following sub-steps:

and a substep a1 of inputting the first image block and the second image block to the discriminator respectively to obtain a first determination result corresponding to the first image block and a second determination result corresponding to the second image block.

Substep a2, determining a first loss value according to a difference between a first true value of the first image block and the first determination result, and determining a second loss value according to a difference between a true value of the second image block and the second determination result; the first real value reflects the value of the input object judged by the discriminator to be false; and the real value of the second image block reflects the value of the input object judged by the discriminator as true.

Substep a3 alternately trains the arbiter based on the first loss value and the second loss value.

Specifically, the result output by the discriminator is a real number in the range of 0 to 1, and the output value of the discriminator is 0.5 to 1, which indicates that the discrimination is true at this time; the output value of the discriminator is 0-0.5, which indicates that the discrimination is false.

In the process of training the discriminator, the result that the discriminator can recognize the first image block is false for the training target of the first image block, and the result that the discriminator can recognize the second image block is true for the training target of the second image block, then the first true value corresponding to the first image block is 0, and the true value corresponding to the second image block is 1, so that the first loss value of the first image block is the difference between the first true value 0 and the first result corresponding to the first image block, and the second loss value of the second image block is the difference between the true value 1 and the second result corresponding to the second image block. And alternately training the parameters of the discriminator by using the first loss value and the second loss value, and improving the capability of the discriminator for identifying the truth of the image block after multiple rounds of iteration operation. The alternate training refers to one time of performing the parameter training of the discriminator by using the first loss value, and the other time of performing the parameter training of the discriminator by using the second loss value.

The loss function (loss function) is a function that maps the value of a random variable to a non-negative real number to represent the "risk" or "loss" of the random event. In application, the loss function is usually associated with the optimization problem as a learning criterion, i.e. the model is solved and evaluated by minimizing the loss function. In the process of training the model, a suitable loss function can be selected according to actual requirements, and the embodiment of the application does not limit the specifically selected first loss function.

In the training process, a random gradient descent algorithm can be adopted to optimize the training process, wherein the random gradient descent algorithm randomly extracts a group of data from the training data in a plurality of iteration operations, updates the group of data according to the gradient after training, extracts a group of data again and updates the group of data again, and under the condition that the training data volume is extremely large, a model with a loss value within an acceptable range can be obtained without training all samples, wherein the random means that the input data are randomly disordered in each iteration process, and the aim is to effectively reduce the problem of parameter updating offset caused by input data. And after each successive iteration operation, the loss value between the output result of the discriminator and the true value is reduced in a gradient manner, and the training of the image style migration model can be considered to be finished under the condition that the loss value is within the preset range.

Optionally, the training of the generator according to the first image block and the discriminator may specifically be implemented by the following sub-steps:

and a substep B1 of inputting the first image block to the discriminator to obtain a third determination result corresponding to the first image block.

Substep B2, determining a third loss value according to a difference between the second true value of the first image block and the third determination result; the second true value reflects a value of the input object when the input object is judged to be true by the discriminator.

Substep B3, training the generator according to the third loss value.

In the generator training process, the training target is that the result of a first image block in a stylized image output by the generator can be identified as true by the discriminator, then the second true value corresponding to the first image block is 1 at the moment, so that the third loss value of the first image block is the difference value between the second true value 1 and the third result corresponding to the first image block, the third loss value and the second loss function are used for training the parameters of the generator, after multiple rounds of iteration operation, the generator output capability of the stylized image with a high fusion quality edge area is improved, after the preset iteration times are reached, the training of the image style migration model is finished, and the target generator is obtained.

Fig. 7 is a block diagram of an apparatus for training an image style migration model according to an embodiment of the present application, and as shown in fig. 7, the apparatus includes: an acquisition module 401, a first style module 402, a segmentation module 403, and a training module 404.

An obtaining module 401 configured to obtain an original image and a real image, where the original image and the real image both include a target object, and a style attribute of the target object included in the original image is different from a style attribute of the target object included in the real image;

a first style module 402, configured to input the original image into a generator of a neural network model, to obtain a stylized image, where the stylized image is an image obtained by stylizing a target object in the original image; the neural network model further comprises a discriminator;

a segmentation module 403 configured to extract a first edge region of the target object in the stylized image and a second edge region of the target object in the real image;

a training module 404 configured to train the generator and the discriminator according to the first edge region and the second edge region, and use the trained generator as an image style migration model.

The training module comprises:

In one implementation, the training submodule includes:

In one implementation, the first training unit includes:

In one implementation, the second training unit includes:

In one implementation, the training module includes:

In summary, the training apparatus for an image style migration model provided in the embodiment of the present application may extract a first edge region of a target object in a stylized image and a second edge region of the target object in a real image. Therefore, the subsequent training process can only take the edge area as input, and in the training process, the training process of the model can only take the edge area as input, so that the model can only process the characteristics of the edge area, and the calculation amount is reduced. In addition, the training process can focus on edge region fusion in the image style migration process, so that the change of the detail texture of the edge region can be easily captured, and the edge region repairing effect of the image stylization result is improved.

Fig. 8 is a block diagram of a stylized image generating apparatus according to an embodiment of the present application, and as shown in fig. 8, the stylized image generating apparatus includes: style processing module 501.

The style processing module 501 is configured to input an image to be processed into an image style migration model, and obtain a stylized image output by the image style migration model;

the image style migration model is obtained by the training device of the image style migration model shown in fig. 7.

In summary, the stylized image generating apparatus provided in the embodiment of the present application may extract a first edge area of a target object in a stylized image and a second edge area of the target object in a real image. Therefore, the subsequent training process can only take the edge area as input, and in the training process, the training process of the model can only take the edge area as input, so that the model can only process the characteristics of the edge area, and the calculation amount is reduced. In addition, the training process can focus on edge region fusion in the image style migration process, so that the change of the detail texture of the edge region can be easily captured, and the edge region repairing effect of the image stylization result is improved.

Fig. 9 is a block diagram illustrating an electronic device 600 according to an example embodiment. For example, the electronic device 600 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 9, electronic device 600 may include one or more of the following components: a processing component 602, a memory 604, a power component 606, a multimedia component 608, an audio component 610, an interface to input/output (I/O) 612, a sensor component 614, and a communication component 616.

The processing component 602 generally controls overall operation of the electronic device 600, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 602 may include one or more processors 620 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 602 can include one or more modules that facilitate interaction between the processing component 602 and other components. For example, the processing component 602 can include a multimedia module to facilitate interaction between the multimedia component 608 and the processing component 602.

The memory 604 is used to store various types of data to support operations at the electronic device 600. Examples of such data include instructions for any application or method operating on the electronic device 600, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 604 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power supply component 606 provides power to the various components of electronic device 600. The power components 606 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 600.

The multimedia component 608 includes a screen that provides an output interface between the electronic device 600 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense demarcations of a touch or slide action, but also detect a duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 608 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 600 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 610 is used to output and/or input audio signals. For example, the audio component 610 may include a Microphone (MIC) for receiving external audio signals when the electronic device 600 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 604 or transmitted via the communication component 616. In some embodiments, audio component 610 further includes a speaker for outputting audio signals.

The I/O interface 612 provides an interface between the processing component 602 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 614 includes one or more sensors for providing status assessment of various aspects of the electronic device 600. For example, the sensor component 614 may detect an open/closed state of the electronic device 600, the relative positioning of components, such as a display and keypad of the electronic device 600, the sensor component 614 may also detect a change in the position of the electronic device 600 or a component of the electronic device 600, the presence or absence of user contact with the electronic device 600, orientation or acceleration/deceleration of the electronic device 600, and a change in the temperature of the electronic device 600. The sensor assembly 614 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 614 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 614 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 616 is operable to facilitate wired or wireless communication between the electronic device 600 and other devices. The electronic device 600 may access a wireless network based on a communication standard, such as WiFi, a carrier network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 616 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 616 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 600 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components, for implementing a training method of an image style migration model provided by an embodiment of the present application.

In an exemplary embodiment, a non-transitory computer storage medium including instructions, such as the memory 604 including instructions, executable by the processor 620 of the electronic device 600 to perform the above-described method is also provided. For example, the non-transitory storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Fig. 10 is a block diagram illustrating an electronic device 700 in accordance with an example embodiment. For example, the electronic device 700 may be provided as a server. Referring to fig. 10, electronic device 700 includes a processing component 722 that further includes one or more processors, and memory resources, represented by memory 732, for storing instructions, such as applications, that are executable by processing component 722. The application programs stored in memory 732 may include one or more modules that each correspond to a set of instructions. In addition, the processing component 722 is configured to execute instructions to perform a method for training an image style migration model provided by an embodiment of the present application.

The electronic device 700 may also include a power component 726 that is configured to perform power management of the electronic device 700, a wired or wireless network interface 750 that is configured to connect the electronic device 700 to a network, and an input output (I/O) interface 758. The electronic device 700 may operate based on an operating system stored in memory 732, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

The embodiment of the present application further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the method for training the image style migration model is implemented.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice in the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A method for training an image style migration model, the method comprising:

2. The method of claim 1, wherein training the generator and the discriminator according to the first edge region and the second edge region, and using the trained generator as an image style migration model comprises:

3. The method of claim 2, wherein training the generator and the discriminator based on the first patch and the second patch comprises:

4. The method of claim 3, wherein training the discriminator based on the first patch and the second patch comprises:

5. A stylized image generation method, the method comprising:

the image style migration model is obtained by training according to the training method of the image style migration model in any one of claims 1 to 4.

6. An apparatus for training an image style migration model, the apparatus comprising:

7. A stylized image generating apparatus, the apparatus comprising:

wherein the image style migration model is obtained by the training device of the image style migration model according to claim 6.

8. An electronic device, comprising: a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the method of any one of claims 1 to 5.

9. A computer storage medium, wherein instructions in the computer-readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the method of any of claims 1-5.

10. A computer program product comprising a computer program, characterized in that the computer program realizes the method of any of claims 1-5 when executed by a processor.