CN114758035B

CN114758035B - Image generation method and device for unpaired data set

Info

Publication number: CN114758035B
Application number: CN202210661703.3A
Authority: CN
Inventors: 张丽颖; 陈�光; 朱世强; 曾令仿; 程永利; 陈兰香; 李勇; 张云云; 朱健
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2022-06-13
Filing date: 2022-06-13
Publication date: 2022-09-27
Anticipated expiration: 2042-06-13
Also published as: CN114758035A

Abstract

The invention discloses an image generation method and device for an unpaired data set, wherein the method comprises the following steps: improving the first model and the second model, wherein the second model comprises a first submodel and a second submodel; taking two groups of unpaired data sets with the same data distribution in the interior as the input of the improved first model, training the improved first model, and respectively training the improved first sub-model and the improved second sub-model through two groups of paired data sets output after the training of the improved first model is finished; acquiring an unpaired data set; inputting the unpaired data set into a trained first model to obtain a first generated data set and a second generated data set generated by the first model; and inputting the first generation data set and the second generation data set into the trained first sub-model and second sub-model respectively, and taking a third generation data set and a fourth generation data set generated by the first sub-model and the second sub-model as final generation results.

Description

Image generation method and device for unpaired data set

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to an image generation method and device for an unpaired data set.

Background

Generation of Generic Adaptive Networks (GANs) antagonistic Networks has enjoyed significant results in a number of areas such as image generation, image editing and presentation learning. GAN has succeeded in the notion of antagonism loss, whose main purpose is to make the generated images theoretically indistinguishable from real photographs, which is the goal of computer vision field aiming at optimization. GAN learns the source domain to target domain mapping by antagonism loss, making the generated image indistinguishable from the image in the target domain. Over 2018, scholars have proposed a number of models for image-to-image generation tasks, and there are two main types of typical methods. One is to learn a mapping based on a paired dataset to generate an image, which is also the best image generation algorithm at present. Therefore, most of the prior arts are based on the matching training example to learn the mapping from the input to the output image, and the advantage of the series of models is that the training of the models results in the upper limit of the image generation field. But their disadvantages are also evident, it will depend on paired data sets, the data collection cost is high, and the application scope is narrow.

The second category of methods is based on unmatched datasets and requires consistent distribution within the dataset. Such a model can be trained on unpaired datasets, eventually achieving similar results to paired training datasets, but still with a large gap. The advantages of the algorithm are obvious, the training data set is limited less, the application range is wider, and the algorithm is a relatively universal solution. Some of these models rely on predefined similarity functions between the input and output, and some assume that the input and output must lie in the same low-dimensional embedding space. In the series of models, the CycleGAN model does not need the assumptions and limitations, and the idea of cycle consistency makes the model stand out in the series of models, but there is an insurmountable gap compared with the model results in the first-class method.

In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:

the generation method based on the unpaired image has wider application scenes. However, in comparison with a model trained using a paired data set, the reality, image quality, and the like of a picture generated from the model are insurmountable different from pictures generated from the model trained using the paired data set.

Disclosure of Invention

In view of the deficiencies of the prior art, an object of the embodiments of the present application is to provide an image generation method and apparatus for unpaired data sets.

According to a first aspect of embodiments of the present application, there is provided an image generation method for an unpaired data set, comprising:

improving the first model and the second model, wherein the second model comprises a first submodel and a second submodel;

taking two groups of unpaired data sets with the same data distribution in the interior as the input of the improved first model, training the improved first model, and respectively training the improved first sub-model and the improved second sub-model through two groups of paired data sets output after the training of the improved first model is finished;

acquiring an unpaired data set;

inputting the unpaired data set into a trained first model to obtain a first generated data set and a second generated data set generated by the first model;

and inputting the first generation data set and the second generation data set into the trained first sub-model and second sub-model respectively, and taking a third generation data set and a fourth generation data set generated by the first sub-model and the second sub-model as final generation results.

Further, the first model is improved, and the method comprises the following steps:

modifying the cyclic consistency loss in the first model to a segment loss to remove the inclusion term in the cyclic consistency loss.

Further, the segment loss is, for two sets of unpaired data sets X and Y with the same distribution of internal data:

using a forward cycle consistency loss when the first model is learning mapping X → Y;

when the first model is learning mapping Y → X, a reverse cycle consistency loss is used.

Further, the second model is improved, and the method comprises the following steps:

and multiplying the corresponding resistance loss in the improved first model by a preset weight factor, and respectively adding the product into the full objective functions of the first submodel and the second submodel.

Further, taking two groups of unpaired data sets with the same data distribution inside as input of the improved first model, training the improved first model, and including:

two groups of unpaired data sets X and Y with internal data in the same distribution are constructed and are used as the input of the improved first model to learn mapping X → Y and mapping Y → X;

optimizing the improved first model through the antagonism loss, the identity mapping loss and the improved cycle consistency loss, suspending the training of the model when the full target loss is reached and the fluctuation is within a preset range or the training frequency reaches a preset threshold value, and outputting a first false picture set paired with the unpaired data set X

And a second set of false pictures B paired with the unpaired data set Y.

Further, training the improved second model by the two pairs of paired data sets output after the training of the improved first model is completed, including:

taking a pairing data set (X, A) as an input of a second generator of a first submodel in the second model, so that the second generator learns the mapping X → A, updating a second discriminator of the first submodel through a modified full objective function, and stopping training until a generation result of the second generator can cheat the second discriminator;

taking the paired dataset (B, Y) as input of a third generator of a second submodel in the second model, so that the third generator learns the mapping B → Y, updating a third discriminator of the second submodel by the improved full objective function, and stopping training until the generation result of the third generator can cheat the third discriminator.

According to a second aspect of embodiments of the present application, there is provided an image generation apparatus for an unpaired data set, comprising:

the improvement module is used for improving the first model and the second model, wherein the second model comprises a first submodel and a second submodel;

the training module is used for taking two groups of unpaired data sets with the same data distribution in the interior as the input of the improved first model, training the improved first model, and respectively training the improved first sub-model and the improved second sub-model through two pairs of matched data sets output after the training of the improved first model is finished;

the acquisition module is used for acquiring a data set to be paired;

the first input module is used for inputting the data set to be paired into the first model to obtain a first generation data set generated by a first generator of the first model;

and the second input module is used for inputting the first generated data set into the improved second model and taking the second generated data set generated by the second model as a final generated result.

According to a third aspect of embodiments of the present application, there is provided an electronic apparatus, including:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement an image generation method for an unpaired data set as described in the first aspect.

According to a fourth aspect of embodiments herein, there is provided a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, carry out the steps of the image generation method for unpaired datasets as described in the first aspect.

The technical scheme provided by the embodiment of the application can have the following beneficial effects:

according to the embodiments, only the unpaired training data set is needed, and the limitation on the data set is smaller. The cost and manpower for data collection are greatly reduced, and the method has universality in an image generation task; through the correction of the second model, the combined model can achieve the result consistent with the application pairing data set training model, so that the generated image is more vivid.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

FIG. 1 is a flow chart illustrating a method of image generation for an unpaired data set, according to an example embodiment.

FIG. 2 is a diagram illustrating three loss functions in a CycleGAN model according to an exemplary embodiment.

Fig. 3 is a network architecture diagram illustrating a Pix2Pix model according to an example embodiment.

FIG. 4 is a flowchart illustrating training of an improved first model according to an exemplary embodiment.

FIG. 5 is a flowchart illustrating training of an improved second model with two pairs of paired data sets output after training of the improved first model is completed, according to an example embodiment.

FIG. 6 is a block diagram illustrating an image generation apparatus for unpaired datasets in accordance with an exemplary embodiment.

FIG. 7 is a schematic diagram of an electronic device shown in accordance with an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if," as used herein, may be interpreted as "at … …" or "when … …" or "in response to a determination," depending on the context.

The noun explains:

pairing the data sets: two sets of data sets in the source domain

In this case, the pictures need to correspond one to one. Namely, it is

And

there must be one picture pair.

Unpaired datasets: two sets of data sets in the source domain

Do not require pairing of two, only

The data distribution inside is consistent,

the data distribution inside is consistent. I.e. only need to

And

two sets correspond, not requiring

And

there is a correspondence.

GAN model framework: a new framework for generating models is estimated by the countermeasure process. Training two models simultaneously in the framework, and capturing the generation model of data distribution

And a discriminant model for estimating probabilities from the training data

. Both models in the invention are improved on the basis of GAN;

CycleGAN model: learning without picture pair (i.e. two pictures in one-to-one correspondence)

Picture field to

Mapping between picture domains. The goal is to learn a mapping relationship through resistance loss

So that

May be close to

Distribution of (2). Since the mapping is highly constrained, it is inverse mapped

Combined, introduces cyclic consistency loss

;

Loss of cycle consistency: the idea of adjusting structured data by using transitivity relates to a cycle consistency loss as a transitivity way to supervise the training of CycleGAN;

loss of antagonism: in the GAN framework, a zero sum game exists between the generator and the discriminator, and the antagonism loss is in the process of quantifying the zero sum game. The discriminator attempts to discriminate a false picture generated by the generator as false (i.e., the smaller the probability of discrimination as true, the better), and a true picture in the source domain as true (i.e., the larger the probability of discrimination as true, the better). And the generator needs to catch the mistake made by the discriminator, and the probability that the discriminator judges the generated false picture as true is better.

FIG. 1 is a flow chart illustrating a method of image generation for unpaired datasets, as shown in FIG. 1, which may include the steps of:

step S11: improving the first model and the second model, wherein the second model comprises a first submodel and a second submodel;

step S12: taking two groups of unpaired data sets with the same data distribution in the interior as the input of the improved first model, training the improved first model, and respectively training the improved first sub-model and the improved second sub-model through two groups of paired data sets output after the training of the improved first model is finished;

step S13: acquiring an unpaired data set;

step S14: inputting the unpaired data set into a trained first model to obtain a first generated data set and a second generated data set generated by the first model;

step S15: and inputting the first generation data set and the second generation data set into the trained first sub-model and second sub-model respectively, and taking a third generation data set and a fourth generation data set generated by the first sub-model and the second sub-model as final generation results.

In a specific implementation, the first model is a generation model for an unpaired data set, such as a CycleGAN, CoGAN, StrarGAN, UNIT, and the like, and the first submodel and the second submodel in the second model are generation models for a paired data set, such as a Pix2Pix, DiscoGAN, DRAGAN, DualGAN, BicycleGAN, BiGAN, SimGAN, and the like, in this embodiment, the first model adopts a CycleGAN model, and the first submodel and the second submodel of the second model both adopt a Pix2Pix model.

In an implementation of step S11, refining the first model and the second model, wherein the second model includes a first sub-model and a second sub-model;

specifically, the first model is improved, and comprises the following steps:

Wherein, the segment loss is that for two groups of unpaired data sets X and Y with the same internal data distribution:

In the present embodiment, for mapping

And mapping

Is a discriminator

The resistance loss is:

to represent

Compliance

And (6) data distribution.

The same is true.

Aiming to maximize this target loss, confronted with an adversary trying to minimize it

. Namely:

as a function of the mapping

And mappingFIs a discriminator

A similar resistance loss was introduced, namely:

the cycle consistency loss is defined as:

the identity mapping loss is:

the full target loss for the model is:

wherein

To represent

The weighting parameter is a hyper-parameter, and the value can be automatically adjusted in the training process of the model to adjust the influence of the cycle consistency loss on the overall loss.

The optimization target is as follows:

specifically, the network structure of the improved CycleGAN is divided into two parts, one part is a generator, and the method comprises the following steps: c7s1-64, d128, d256, R256, u128, u64, c7s1-3, wherein c7s1-k denotes a 7 × 7 contribution-InstanceNorm-ReLU layer with k filters and step size 1; dk denotes a 3 × 3 contribution-InstanceNorm-ReLU layer with k filters and step size 2. Reflective padding is used to reduce artifacts; rk represents a residual block comprising two 3 × 3 convolutional layers, with the same number of filters on both layers; uk denotes a 3 x 3 fractional-strained-volume-InstanceNorm-ReLU layer with k filters and a step size 1/2.

The other part is a discriminator: C64-C128-C256-C512, wherein Ck denotes a 4X 4 Convolition-InstanceNorm-LeakyReLU layer with k filters and step size 2; after the last layer, applying convolution to produce a one-dimensional output; InstanceNorm was not used in the first C64 layer; a leak ReLU with a slope of 0.2 was used.

Specifically, in the embodiment of the present application, the generation of an image in which a zebra is converted from a horse in an image deformation task is taken as an example, and the modification of the cycle consistency loss into a piecewise function is to prevent the zebra picture generated by the generator from not corresponding to a real input horse picture one to one. As shown in FIG. 2, FIG. 2 explains the effect of three losses in the design of the CycleGAN model intuitively. Taking a picture of a horse as X and a picture of a zebra as Y, inputting the X into G, and outputting fake _ Y, wherein at the moment, the fake _ Y is required to be ensured to be a discriminator which can not be identified as far as possible

A picture of a zebra is distinguished, so design

Losses are made to ensure this. The fake _ Y should not only be a picture of a zebra, but also be in one-to-one correspondence with X, that is, the fake _ Y should be a zebra with textures added on the basis of X, and cannot be a picture of a zebra at random. To ensure thisDesign the loss

The round robin consistency loss is also set to a piecewise function to achieve this. When learning the G-map, we use only the forward cycle consistency penalty and do not add the reverse cycle consistency penalty. This is because, when the fake _ Y generated by G is a zebra that does not correspond to the original picture X,

not enough to identify this error, but the reverse loop consistency loss will learn the fake _ Y as much as possible into a false picture like the original one. At this point, the reverse cycle consistency loss conceals the error in G. Finally, the zebra picture generated by the G cannot correspond to the original picture X one by one. The above-described error does not occur in order for the forward cycle consistency loss to contain the refuge F. So we only use the reverse cycle consistency loss when training F.

For Unpaired datasets consisting of 256 × 256 or higher resolution pictures, the network structure given in j, Zhu, t, Park, p, Isola and a.a. efrost, "unappered Image-to-Image transformation Using Cycle-dependent adaptation Networks," 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2242-. Practice shows that the network structure setting and the hyper-parameter setting can lead the model convergence speed to be faster and the image generation effect to be better. Thus, this patent continues to use this network architecture. The network structure that performs best in the image generation task can also be used if the model-as with the other GAN family of models mentioned above-is used.

Specifically, the second model is improved, and comprises the following steps:

and multiplying the corresponding antagonistic loss in the improved first model by a preset weighting factor, and respectively adding the product to the full objective functions of the first submodel and the second submodel.

In this embodiment, Dropout is used to add random noiseCombining the antagonism loss and the traditional loss to form a final full objective function, optimizing the full objective function to make the network converge, and improving the full objective function of the Pix2Pix model

Is defined as:

wherein

Is represented as follows:

where G denotes the generator of the Pix2Pix model, D denotes the discriminator of the Pix2Pix model, and z is the random influence generated by adding Dropout at the generating network part. And x and y represent the input of a pix2pix model, wherein x is a false picture generated by the model-generating device, and y is an actual real picture corresponding to x. (assuming that a matching data set (Y, B) is obtained after the model is trained, Y is a real picture which is Y in the formula, and B is an x in the formula after being generated into a picture)

When the picture of the zebra generated by the generator of the model one can cheat the discriminator

The horse picture generated can be deceived

Then, the discriminator at this time

Is an identifier capable of identifying whether the picture is mature or not. And a discriminator capable of judging whether the picture is the zebra needs to be trained in the submodel of the second model.In order to accelerate the convergence speed of the neutron model in the model two, the antagonism in the model one is lost

A weighting factor is added and added to the full objective function of the submodel. In the training process, the weight factor is set to 0.01. similarly, when converting zebra to horse, add to the full objective function

。

The network structure of the Pix2Pix model is shown in fig. 3, wherein X represents a zebra picture generated in the first model, and Real _ y represents a Real horse picture corresponding to X. The generator G part uses a U-Net structure, and the discriminator D part uses a PatchGAN structure. Ck denotes the Convolition-BatchNorm-ReLU layer with k filters. CDk denotes k volume-BatchNorm-Dropout-ReLU layers, and the Dropout ratio is 50%. All convolutions are 4 x 4 spatial filters with an application step of 2. The convolutions in the encoder and discriminator are downsampled by a factor of 2, while they are upsampled by a factor of 2 in the decoder. The generator architecture is as follows: the decoder is C64-C128-C256-C512-C512-C512, and the U-Net decoder is a CD512-CD1024-CD 1024-C1024-C1024-C512-C256-C128.70 x 70 discriminator, and the structure is as follows: C64-C128-C256-C512. the network structure is the better performing network structure proposed in Pix2Pix paper "P. Isola, J. Zhu, T. Zhou and A.A. Efron," Image-to-Image transformation with Conditional adaptive Networks, "2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 5967 19 + 5976, doi: 10.1109/CVPR.2017.632.

In the specific implementation of step S12, two sets of unpaired data sets with the same data distribution inside are used as inputs of the improved first model, the improved first model is trained, and the improved first submodel and the improved second submodel are respectively trained by two pairs of paired data sets output after the training of the improved first model is completed;

specifically, as shown in fig. 4, the process of training the improved first model includes:

step S21: two groups of unpaired data sets X and Y with internal data in the same distribution are constructed and are used as the input of the improved first model to learn mapping X → Y and mapping Y → X;

specifically, taking the generation of an image of a horse converted into a zebra in an image deformation task as an example, two sets of datasets of the horse and the zebra are downloaded from an ImageNet website. The data set for the horse contained 939 pictures and the data set for the zebra contained 1177 images. All picture data is scaled so that the resolution of each picture is 256 × 256.

Step S22: optimizing the improved first model through the antagonism loss, the identity mapping loss and the improved cycle consistency loss, suspending the training of the model when the full target loss is reached and the fluctuation is within a preset range or the training frequency reaches a preset threshold value, and outputting a first false picture set paired with the unpaired data set X

And a second set of false pictures B paired with the unpaired data set Y.

Specifically, an antagonism loss, an improved cycle consistency loss, an identity mapping loss are calculated. And optimizing the full objective function. An Adam solver is applied to calculate the gradient and update the network parameters. To ensure that the model is more stable during training, the generator is trained using least squares penalties instead of negative log-likelihood targets. I.e. training the generator, minimize:

minimization of training of discriminators

To reduce model oscillation, the discriminator is updated with historical data of the generated images, and a buffer of images is reserved for storing 60 previously generated images during training.

When the improved first sub-model converges, two false pictures are output. One is horse data centralization horse

Image of the zebra generated by the image of (2), and is recorded as

The other is zebra data set

Image of horse, noted

。

Specifically, training the improved second model with the two pairs of paired data sets output after the training of the improved first model is completed, as shown in fig. 5, includes:

step S31: taking a pairing data set (X, A) as an input of a second generator of a first submodel in the second model, so that the second generator learns the mapping X → A, updating a second discriminator of the first submodel through a modified full objective function, and stopping training until a generation result of the second generator can cheat the second discriminator;

in a specific implementation, Dropout is added to the generator part to increase randomness during model training, and Adam updater is used to perform gradient solving and update network parameters. The learning rate remains unchanged for the first 100 epochs and then decays to 0 in a linear trend. And when the false picture generated by the first submodel Pix2Pix can deceive the discriminator of the first submodel Pix2Pix, the first submodel Pix2Pix model converges, and the training is finished. In this step, to prevent the network structure from being too complex, the user may reduce the network of the Pix2Pix generator part, e.g. 2-3 blocks may be used. The first sub-model Pix2Pix generator will output a new zebra picture, which is recorded as

I.e. the conversion of the horse into the final result of the zebra image generation task.

Step S32: taking the paired dataset (B, Y) as input of a third generator of a second submodel in the second model, so that the third generator learns the mapping B → Y, updating a third discriminator of the second submodel by the improved full objective function, and stopping training until the generation result of the third generator can cheat the third discriminator.

The specific implementation of step S32 is the same as the specific implementation of step S31, and is not described herein again.

In a specific implementation of step S13, obtaining an unpaired data set;

specifically, taking the generation of an image of a horse converted into a zebra in an image deformation task as an example, two sets of datasets of the horse and the zebra are downloaded from an ImageNet website. The data set of the horse comprises 939 pictures, and the data set of the zebra comprises 1177 pictures. All picture data is scaled so that the resolution of each picture is 256 × 256. A set of unpaired datasets (horses, zebras) was constructed. The data set uses the public data set on the ImageNet website, so that on one hand, the cost of data collection by the data set is saved, on the other hand, the public data set has higher quality, other similar models also use the data set for training, and after the data sets are unified, the training effects of a plurality of models are easier to compare.

In a specific implementation of step S14, after the unpaired data set is input into the trained first model, a first generated data set and a second generated data set generated by the first model are obtained;

specifically, the data sets of the horse and the zebra are input into an improved first model, and two false pictures are output after the training of the first model converges. One is to centralize the horse by the horse data

Image of the zebra generated by the image of (2), and is recorded as

The other is zebra data set

Image of horse, noted

. Finally, two sets of paired data sets are constructed, i.e.

And

preparing an input data set for training the second model. Since the first model is a generated picture trained based on an unpaired dataset. Current research shows that, compared with a model trained using a paired data set, a generation method based on an unpaired image has insurmountable differences in the authenticity, image quality, and the like of a generated image of the model from an image generated by training the model using the paired data set. Thus, the present invention combines the picture generated by the first model and the real picture into a paired data set, which is used as an input for the second model. And further correcting the generated picture to narrow the difference of the description.

In a specific implementation of step S15, the first and second generated data sets are respectively input into the trained first and second submodels, and the third and fourth generated data sets generated by the first and second submodels are used as final generated results.

In a specific implementation, the data sets generated in the first model are input to the first and second submodels of the modified second model, respectively. Data set

Inputting the data set into the first submodel of the improved second model

And inputting the data into a second sub-model of the improved second model. The purpose of this step is to separately pair

And

and (6) correcting. When in use

And

when the discriminators of the first sub-model and the second sub-model can be deceived respectively, the generators of the first sub-model and the second sub-model respectively output final results X _ result and Y _ result, wherein X _ result represents a zebra picture corresponding to X, and Y _ result represents a horse picture corresponding to Y.

The benefit of this design is that since the first model can output two datasets, one is the zebra dataset corresponding to the horse dataset X

And the other is a horse data set corresponding to the zebra data set Y

The invention designs the structure of two sub-models in part of the second model for the purpose of modifying the two generated data sets. This makes the joint model more complete. The whole model can realize the output of pictures with two styles. I.e. the model will output a set of pictures of both horse and zebra styles.

In correspondence with the aforementioned embodiments of the image generation method for unpaired datasets, the present application also provides embodiments of an image generation apparatus for unpaired datasets.

FIG. 6 is a block diagram illustrating an image generation apparatus for unpaired datasets in accordance with an exemplary embodiment. Referring to fig. 6, the apparatus may include:

an improvement module 21, configured to improve a first model and a second model, where the second model includes a first sub-model and a second sub-model;

the training module 22 is configured to train the improved first model by using two sets of unpaired data sets with the same data distribution inside as inputs of the improved first model, and train the improved first submodel and the second submodel respectively through two pairs of paired data sets output after the training of the improved first model is completed;

the obtaining module 23 is configured to obtain a data set to be paired;

a first input module 24, configured to input the data set to be paired into the first model, and obtain a first generated data set generated by a first generator of the first model;

and a second input module 25, configured to input the first generated data set into the improved second model, and use the second generated data set generated by the second model as a final generated result.

With regard to the apparatus in the above embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be described in detail here.

For the device embodiment, since it basically corresponds to the method embodiment, reference may be made to the partial description of the method embodiment for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.

Correspondingly, the present application further provides an electronic device, comprising: one or more processors; a memory for storing one or more programs; when executed by the one or more processors, cause the one or more processors to implement an image generation method for unpaired datasets as described above. As shown in fig. 7, for a hardware structure diagram of any device with data processing capability where an image generation method for an unpaired data set is located according to an embodiment of the present invention, in addition to the processor, the memory, and the network interface shown in fig. 7, any device with data processing capability where an apparatus is located in an embodiment may also include other hardware according to an actual function of the any device with data processing capability, which is not described again.

Accordingly, the present application also provides a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the image generation method for unpaired datasets as described above. The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any data processing capability device described in any of the foregoing embodiments. The computer readable storage medium may also be an external storage device of the wind turbine, such as a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), and the like, provided on the device. Further, the computer readable storage medium may include both an internal storage unit of any data processing capable device and an external storage device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing-capable device, and may also be used for temporarily storing data that has been output or is to be output.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof.

Claims

1. An image generation method for unpaired datasets, comprising:

improving a first model and a second model, wherein the second model comprises a first submodel and a second submodel, the first model is a generation model aiming at an unpaired image data set and is selected from CycleGAN, CoGAN, StarGAN and UNIT, the first submodel and the second submodel in the second model are generation models aiming at a paired image data set and are selected from Pix2Pix, DiscogAN, DRAGAN, DualGAN, BicycleGAN, BiGAN and SimGAN;

taking two groups of unpaired image data sets with the same data distribution in the interior as the input of an improved first model, training the improved first model, and respectively training an improved first sub-model and an improved second sub-model through two groups of matched image data sets output after the training of the improved first model is finished, wherein the training of the first model is suspended when the loss of a full target is reached, the loss fluctuates in a preset range or the training times reach a preset threshold value; stopping training of the first submodel when a result of the second generator of the first submodel may spoof a second discriminator of the first submodel; stopping training of the second submodel when a result generated by the third generator of the second submodel can cheat the third discriminator of the second submodel;

acquiring an unpaired image dataset;

inputting the unpaired image data set into a trained first model to obtain a first generated image data set and a second generated image data set generated by the first model;

inputting the first generated image data set and the second generated image data set into a trained first sub-model and a trained second sub-model respectively, and taking a third generated image data set and a fourth generated image data set generated by the first sub-model and the second sub-model as final generation results;

wherein, improving the first model comprises: modifying the cyclic consistency loss in the first model into a segment loss to remove the inclusion term in the cyclic consistency loss;

the segmentation penalty is, for two sets of unpaired image datasets X and Y with the same distribution of internal data:

using a reverse cycle consistency loss when the first model is learning mapping Y → X;

wherein, improving the second model comprises:

2. The method of claim 1, wherein training the improved first model using two sets of unpaired image data sets with the same data distribution inside as inputs to the improved first model comprises:

two groups of unpaired image data sets X and Y with internal data in the same distribution are constructed and used as the input of the improved first model to learn mapping X → Y and mapping Y → X;

optimizing the improved first model by means of the antagonism loss, the identity mapping loss and the improved cycle consistency loss, suspending the training of the model when the full target loss is reached and the fluctuation is within a predetermined range or the training times reach a predetermined threshold, outputting a first false image set paired with the unpaired image data set X

And a second set of false pictures B paired with the unpaired image data set Y.

3. The method of claim 1, wherein training the modified second model with the two pairs of paired image data sets output after the training of the modified first model is completed comprises:

taking a paired image dataset (X, A) as an input of a second generator of a first submodel in the second model, so that the second generator learns the mapping X → A, updating a second discriminator of the first submodel by a modified full objective function, and stopping training until a generation result of the second generator can cheat the second discriminator;

-taking a set of paired image data (B, Y) as input to a third generator of a second submodel in the second model, such that the third generator learns the mapping B → Y, -updating a third discriminator of the second submodel by a refined full objective function, and-stopping the training until the third discriminator can be fooled by the generation result of the third generator.

4. An image generation apparatus for unpaired datasets, comprising:

the improvement module is used for improving a first model and a second model, wherein the second model comprises a first submodel and a second submodel, the first model is a generation model aiming at an unpaired image data set and is selected from CycleGAN, CoGAN, StarGAN and UNIT, the first submodel and the second submodel in the second model are generation models aiming at a paired image data set and are selected from Pix2Pix, discoGAN, DRAGAN, DualGAN, BicycleGAN, BiGAN and SimGAN;

the training module is used for taking two groups of unpaired image data sets with the same data distribution in the interior as the input of the improved first model, training the improved first model, and respectively training the improved first sub-model and the improved second sub-model through two pairs of matched image data sets output after the training of the improved first model is finished, wherein the training of the first model is suspended when the full target loss is reached and fluctuates within a preset range or the training frequency reaches a preset threshold value; stopping training of the first submodel when a second identifier of the first submodel can be cheated by a generated result of a second generator of the first submodel; stopping training of the second submodel when a generation result of the third generator of the second submodel can cheat a third discriminator of the second submodel;

the acquisition module is used for acquiring an image data set to be paired;

the first input module is used for inputting the image data set to be paired into the first model to obtain a first generated image data set generated by a first generator of the first model;

a second input module, configured to input the first generated image data set into an improved second model, and use a second generated image data set generated by the second model as a final generation result;

wherein, improving the second model comprises:

5. An electronic device, comprising:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the image generation method for unpaired datasets of any one of claims 1-3.

6. A computer readable storage medium having stored thereon computer instructions, which when executed by a processor, carry out the steps of the image generation method for unpaired datasets as claimed in any one of claims 1 to 3.