CN114758035B - Image generation method and device for unpaired data set - Google Patents

Image generation method and device for unpaired data set Download PDF

Info

Publication number
CN114758035B
CN114758035B CN202210661703.3A CN202210661703A CN114758035B CN 114758035 B CN114758035 B CN 114758035B CN 202210661703 A CN202210661703 A CN 202210661703A CN 114758035 B CN114758035 B CN 114758035B
Authority
CN
China
Prior art keywords
model
submodel
data set
training
improved
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210661703.3A
Other languages
Chinese (zh)
Other versions
CN114758035A (en
Inventor
张丽颖
陈�光
朱世强
曾令仿
程永利
陈兰香
李勇
张云云
朱健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202210661703.3A priority Critical patent/CN114758035B/en
Publication of CN114758035A publication Critical patent/CN114758035A/en
Application granted granted Critical
Publication of CN114758035B publication Critical patent/CN114758035B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/20Drawing from basic elements, e.g. lines or circles
    • G06T11/206Drawing of charts or graphs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an image generation method and device for an unpaired data set, wherein the method comprises the following steps: improving the first model and the second model, wherein the second model comprises a first submodel and a second submodel; taking two groups of unpaired data sets with the same data distribution in the interior as the input of the improved first model, training the improved first model, and respectively training the improved first sub-model and the improved second sub-model through two groups of paired data sets output after the training of the improved first model is finished; acquiring an unpaired data set; inputting the unpaired data set into a trained first model to obtain a first generated data set and a second generated data set generated by the first model; and inputting the first generation data set and the second generation data set into the trained first sub-model and second sub-model respectively, and taking a third generation data set and a fourth generation data set generated by the first sub-model and the second sub-model as final generation results.

Description

Image generation method and device for unpaired data set
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to an image generation method and device for an unpaired data set.
Background
Generation of Generic Adaptive Networks (GANs) antagonistic Networks has enjoyed significant results in a number of areas such as image generation, image editing and presentation learning. GAN has succeeded in the notion of antagonism loss, whose main purpose is to make the generated images theoretically indistinguishable from real photographs, which is the goal of computer vision field aiming at optimization. GAN learns the source domain to target domain mapping by antagonism loss, making the generated image indistinguishable from the image in the target domain. Over 2018, scholars have proposed a number of models for image-to-image generation tasks, and there are two main types of typical methods. One is to learn a mapping based on a paired dataset to generate an image, which is also the best image generation algorithm at present. Therefore, most of the prior arts are based on the matching training example to learn the mapping from the input to the output image, and the advantage of the series of models is that the training of the models results in the upper limit of the image generation field. But their disadvantages are also evident, it will depend on paired data sets, the data collection cost is high, and the application scope is narrow.
The second category of methods is based on unmatched datasets and requires consistent distribution within the dataset. Such a model can be trained on unpaired datasets, eventually achieving similar results to paired training datasets, but still with a large gap. The advantages of the algorithm are obvious, the training data set is limited less, the application range is wider, and the algorithm is a relatively universal solution. Some of these models rely on predefined similarity functions between the input and output, and some assume that the input and output must lie in the same low-dimensional embedding space. In the series of models, the CycleGAN model does not need the assumptions and limitations, and the idea of cycle consistency makes the model stand out in the series of models, but there is an insurmountable gap compared with the model results in the first-class method.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:
the generation method based on the unpaired image has wider application scenes. However, in comparison with a model trained using a paired data set, the reality, image quality, and the like of a picture generated from the model are insurmountable different from pictures generated from the model trained using the paired data set.
Disclosure of Invention
In view of the deficiencies of the prior art, an object of the embodiments of the present application is to provide an image generation method and apparatus for unpaired data sets.
According to a first aspect of embodiments of the present application, there is provided an image generation method for an unpaired data set, comprising:
improving the first model and the second model, wherein the second model comprises a first submodel and a second submodel;
taking two groups of unpaired data sets with the same data distribution in the interior as the input of the improved first model, training the improved first model, and respectively training the improved first sub-model and the improved second sub-model through two groups of paired data sets output after the training of the improved first model is finished;
acquiring an unpaired data set;
inputting the unpaired data set into a trained first model to obtain a first generated data set and a second generated data set generated by the first model;
and inputting the first generation data set and the second generation data set into the trained first sub-model and second sub-model respectively, and taking a third generation data set and a fourth generation data set generated by the first sub-model and the second sub-model as final generation results.
Further, the first model is improved, and the method comprises the following steps:
modifying the cyclic consistency loss in the first model to a segment loss to remove the inclusion term in the cyclic consistency loss.
Further, the segment loss is, for two sets of unpaired data sets X and Y with the same distribution of internal data:
using a forward cycle consistency loss when the first model is learning mapping X → Y;
when the first model is learning mapping Y → X, a reverse cycle consistency loss is used.
Further, the second model is improved, and the method comprises the following steps:
and multiplying the corresponding resistance loss in the improved first model by a preset weight factor, and respectively adding the product into the full objective functions of the first submodel and the second submodel.
Further, taking two groups of unpaired data sets with the same data distribution inside as input of the improved first model, training the improved first model, and including:
two groups of unpaired data sets X and Y with internal data in the same distribution are constructed and are used as the input of the improved first model to learn mapping X → Y and mapping Y → X;
optimizing the improved first model through the antagonism loss, the identity mapping loss and the improved cycle consistency loss, suspending the training of the model when the full target loss is reached and the fluctuation is within a preset range or the training frequency reaches a preset threshold value, and outputting a first false picture set paired with the unpaired data set X
Figure 73890DEST_PATH_IMAGE001
And a second set of false pictures B paired with the unpaired data set Y.
Further, training the improved second model by the two pairs of paired data sets output after the training of the improved first model is completed, including:
taking a pairing data set (X, A) as an input of a second generator of a first submodel in the second model, so that the second generator learns the mapping X → A, updating a second discriminator of the first submodel through a modified full objective function, and stopping training until a generation result of the second generator can cheat the second discriminator;
taking the paired dataset (B, Y) as input of a third generator of a second submodel in the second model, so that the third generator learns the mapping B → Y, updating a third discriminator of the second submodel by the improved full objective function, and stopping training until the generation result of the third generator can cheat the third discriminator.
According to a second aspect of embodiments of the present application, there is provided an image generation apparatus for an unpaired data set, comprising:
the improvement module is used for improving the first model and the second model, wherein the second model comprises a first submodel and a second submodel;
the training module is used for taking two groups of unpaired data sets with the same data distribution in the interior as the input of the improved first model, training the improved first model, and respectively training the improved first sub-model and the improved second sub-model through two pairs of matched data sets output after the training of the improved first model is finished;
the acquisition module is used for acquiring a data set to be paired;
the first input module is used for inputting the data set to be paired into the first model to obtain a first generation data set generated by a first generator of the first model;
and the second input module is used for inputting the first generated data set into the improved second model and taking the second generated data set generated by the second model as a final generated result.
According to a third aspect of embodiments of the present application, there is provided an electronic apparatus, including:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement an image generation method for an unpaired data set as described in the first aspect.
According to a fourth aspect of embodiments herein, there is provided a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, carry out the steps of the image generation method for unpaired datasets as described in the first aspect.
The technical scheme provided by the embodiment of the application can have the following beneficial effects:
according to the embodiments, only the unpaired training data set is needed, and the limitation on the data set is smaller. The cost and manpower for data collection are greatly reduced, and the method has universality in an image generation task; through the correction of the second model, the combined model can achieve the result consistent with the application pairing data set training model, so that the generated image is more vivid.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
FIG. 1 is a flow chart illustrating a method of image generation for an unpaired data set, according to an example embodiment.
FIG. 2 is a diagram illustrating three loss functions in a CycleGAN model according to an exemplary embodiment.
Fig. 3 is a network architecture diagram illustrating a Pix2Pix model according to an example embodiment.
FIG. 4 is a flowchart illustrating training of an improved first model according to an exemplary embodiment.
FIG. 5 is a flowchart illustrating training of an improved second model with two pairs of paired data sets output after training of the improved first model is completed, according to an example embodiment.
FIG. 6 is a block diagram illustrating an image generation apparatus for unpaired datasets in accordance with an exemplary embodiment.
FIG. 7 is a schematic diagram of an electronic device shown in accordance with an example embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if," as used herein, may be interpreted as "at … …" or "when … …" or "in response to a determination," depending on the context.
The noun explains:
pairing the data sets: two sets of data sets in the source domain
Figure 260152DEST_PATH_IMAGE002
In this case, the pictures need to correspond one to one. Namely, it is
Figure 920941DEST_PATH_IMAGE003
And
Figure 407417DEST_PATH_IMAGE004
there must be one picture pair.
Unpaired datasets: two sets of data sets in the source domain
Figure 495458DEST_PATH_IMAGE002
Do not require pairing of two, only
Figure 293650DEST_PATH_IMAGE005
The data distribution inside is consistent,
Figure 397610DEST_PATH_IMAGE006
the data distribution inside is consistent. I.e. only need to
Figure 738593DEST_PATH_IMAGE005
And
Figure 997536DEST_PATH_IMAGE006
two sets correspond, not requiring
Figure 283024DEST_PATH_IMAGE003
And
Figure 426560DEST_PATH_IMAGE004
there is a correspondence.
GAN model framework: a new framework for generating models is estimated by the countermeasure process. Training two models simultaneously in the framework, and capturing the generation model of data distribution
Figure 481104DEST_PATH_IMAGE007
And a discriminant model for estimating probabilities from the training data
Figure 317473DEST_PATH_IMAGE008
. Both models in the invention are improved on the basis of GAN;
CycleGAN model: learning without picture pair (i.e. two pictures in one-to-one correspondence)
Figure 824678DEST_PATH_IMAGE005
Picture field to
Figure 7791DEST_PATH_IMAGE006
Mapping between picture domains. The goal is to learn a mapping relationship through resistance loss
Figure 182420DEST_PATH_IMAGE009
So that
Figure 924111DEST_PATH_IMAGE010
May be close to
Figure 184191DEST_PATH_IMAGE006
Distribution of (2). Since the mapping is highly constrained, it is inverse mapped
Figure 669530DEST_PATH_IMAGE011
Combined, introduces cyclic consistency loss
Figure 698666DEST_PATH_IMAGE012
;
Loss of cycle consistency: the idea of adjusting structured data by using transitivity relates to a cycle consistency loss as a transitivity way to supervise the training of CycleGAN;
loss of antagonism: in the GAN framework, a zero sum game exists between the generator and the discriminator, and the antagonism loss is in the process of quantifying the zero sum game. The discriminator attempts to discriminate a false picture generated by the generator as false (i.e., the smaller the probability of discrimination as true, the better), and a true picture in the source domain as true (i.e., the larger the probability of discrimination as true, the better). And the generator needs to catch the mistake made by the discriminator, and the probability that the discriminator judges the generated false picture as true is better.
FIG. 1 is a flow chart illustrating a method of image generation for unpaired datasets, as shown in FIG. 1, which may include the steps of:
step S11: improving the first model and the second model, wherein the second model comprises a first submodel and a second submodel;
step S12: taking two groups of unpaired data sets with the same data distribution in the interior as the input of the improved first model, training the improved first model, and respectively training the improved first sub-model and the improved second sub-model through two groups of paired data sets output after the training of the improved first model is finished;
step S13: acquiring an unpaired data set;
step S14: inputting the unpaired data set into a trained first model to obtain a first generated data set and a second generated data set generated by the first model;
step S15: and inputting the first generation data set and the second generation data set into the trained first sub-model and second sub-model respectively, and taking a third generation data set and a fourth generation data set generated by the first sub-model and the second sub-model as final generation results.
According to the embodiments, only the unpaired training data set is needed, and the limitation on the data set is smaller. The cost and manpower for data collection are greatly reduced, and the method has universality in an image generation task; through the correction of the second model, the combined model can achieve the result consistent with the application pairing data set training model, so that the generated image is more vivid.
In a specific implementation, the first model is a generation model for an unpaired data set, such as a CycleGAN, CoGAN, StrarGAN, UNIT, and the like, and the first submodel and the second submodel in the second model are generation models for a paired data set, such as a Pix2Pix, DiscoGAN, DRAGAN, DualGAN, BicycleGAN, BiGAN, SimGAN, and the like, in this embodiment, the first model adopts a CycleGAN model, and the first submodel and the second submodel of the second model both adopt a Pix2Pix model.
In an implementation of step S11, refining the first model and the second model, wherein the second model includes a first sub-model and a second sub-model;
specifically, the first model is improved, and comprises the following steps:
modifying the cyclic consistency loss in the first model to a segment loss to remove the inclusion term in the cyclic consistency loss.
Wherein, the segment loss is that for two groups of unpaired data sets X and Y with the same internal data distribution:
using a forward cycle consistency loss when the first model is learning mapping X → Y;
when the first model is learning mapping Y → X, a reverse cycle consistency loss is used.
In the present embodiment, for mapping
Figure 876838DEST_PATH_IMAGE009
And mapping
Figure 93055DEST_PATH_IMAGE007
Is a discriminator
Figure 506719DEST_PATH_IMAGE013
The resistance loss is:
Figure 764263DEST_PATH_IMAGE014
Figure 237970DEST_PATH_IMAGE015
to represent
Figure 348008DEST_PATH_IMAGE016
Compliance
Figure 34204DEST_PATH_IMAGE017
And (6) data distribution.
Figure 913299DEST_PATH_IMAGE018
The same is true.
Figure 557907DEST_PATH_IMAGE013
Aiming to maximize this target loss, confronted with an adversary trying to minimize it
Figure 889662DEST_PATH_IMAGE019
. Namely:
Figure 379549DEST_PATH_IMAGE020
as a function of the mapping
Figure 337317DEST_PATH_IMAGE021
And mappingFIs a discriminator
Figure 152826DEST_PATH_IMAGE022
A similar resistance loss was introduced, namely:
Figure 971877DEST_PATH_IMAGE023
the cycle consistency loss is defined as:
Figure 531035DEST_PATH_IMAGE024
the identity mapping loss is:
Figure 119142DEST_PATH_IMAGE025
the full target loss for the model is:
Figure 839973DEST_PATH_IMAGE026
wherein
Figure 146321DEST_PATH_IMAGE027
To represent
Figure 243590DEST_PATH_IMAGE028
The weighting parameter is a hyper-parameter, and the value can be automatically adjusted in the training process of the model to adjust the influence of the cycle consistency loss on the overall loss.
The optimization target is as follows:
Figure 450318DEST_PATH_IMAGE029
specifically, the network structure of the improved CycleGAN is divided into two parts, one part is a generator, and the method comprises the following steps: c7s1-64, d128, d256, R256, u128, u64, c7s1-3, wherein c7s1-k denotes a 7 × 7 contribution-InstanceNorm-ReLU layer with k filters and step size 1; dk denotes a 3 × 3 contribution-InstanceNorm-ReLU layer with k filters and step size 2. Reflective padding is used to reduce artifacts; rk represents a residual block comprising two 3 × 3 convolutional layers, with the same number of filters on both layers; uk denotes a 3 x 3 fractional-strained-volume-InstanceNorm-ReLU layer with k filters and a step size 1/2.
The other part is a discriminator: C64-C128-C256-C512, wherein Ck denotes a 4X 4 Convolition-InstanceNorm-LeakyReLU layer with k filters and step size 2; after the last layer, applying convolution to produce a one-dimensional output; InstanceNorm was not used in the first C64 layer; a leak ReLU with a slope of 0.2 was used.
Specifically, in the embodiment of the present application, the generation of an image in which a zebra is converted from a horse in an image deformation task is taken as an example, and the modification of the cycle consistency loss into a piecewise function is to prevent the zebra picture generated by the generator from not corresponding to a real input horse picture one to one. As shown in FIG. 2, FIG. 2 explains the effect of three losses in the design of the CycleGAN model intuitively. Taking a picture of a horse as X and a picture of a zebra as Y, inputting the X into G, and outputting fake _ Y, wherein at the moment, the fake _ Y is required to be ensured to be a discriminator which can not be identified as far as possible
Figure 76471DEST_PATH_IMAGE030
A picture of a zebra is distinguished, so design
Figure 994749DEST_PATH_IMAGE031
Losses are made to ensure this. The fake _ Y should not only be a picture of a zebra, but also be in one-to-one correspondence with X, that is, the fake _ Y should be a zebra with textures added on the basis of X, and cannot be a picture of a zebra at random. To ensure thisDesign the loss
Figure 36654DEST_PATH_IMAGE028
The round robin consistency loss is also set to a piecewise function to achieve this. When learning the G-map, we use only the forward cycle consistency penalty and do not add the reverse cycle consistency penalty. This is because, when the fake _ Y generated by G is a zebra that does not correspond to the original picture X,
Figure 333775DEST_PATH_IMAGE031
not enough to identify this error, but the reverse loop consistency loss will learn the fake _ Y as much as possible into a false picture like the original one. At this point, the reverse cycle consistency loss conceals the error in G. Finally, the zebra picture generated by the G cannot correspond to the original picture X one by one. The above-described error does not occur in order for the forward cycle consistency loss to contain the refuge F. So we only use the reverse cycle consistency loss when training F.
For Unpaired datasets consisting of 256 × 256 or higher resolution pictures, the network structure given in j, Zhu, t, Park, p, Isola and a.a. efrost, "unappered Image-to-Image transformation Using Cycle-dependent adaptation Networks," 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2242-. Practice shows that the network structure setting and the hyper-parameter setting can lead the model convergence speed to be faster and the image generation effect to be better. Thus, this patent continues to use this network architecture. The network structure that performs best in the image generation task can also be used if the model-as with the other GAN family of models mentioned above-is used.
Specifically, the second model is improved, and comprises the following steps:
and multiplying the corresponding antagonistic loss in the improved first model by a preset weighting factor, and respectively adding the product to the full objective functions of the first submodel and the second submodel.
In this embodiment, Dropout is used to add random noiseCombining the antagonism loss and the traditional loss to form a final full objective function, optimizing the full objective function to make the network converge, and improving the full objective function of the Pix2Pix model
Figure 396408DEST_PATH_IMAGE032
Is defined as:
Figure 536403DEST_PATH_IMAGE033
wherein
Figure 116420DEST_PATH_IMAGE034
Is represented as follows:
Figure 658260DEST_PATH_IMAGE035
where G denotes the generator of the Pix2Pix model, D denotes the discriminator of the Pix2Pix model, and z is the random influence generated by adding Dropout at the generating network part. And x and y represent the input of a pix2pix model, wherein x is a false picture generated by the model-generating device, and y is an actual real picture corresponding to x. (assuming that a matching data set (Y, B) is obtained after the model is trained, Y is a real picture which is Y in the formula, and B is an x in the formula after being generated into a picture)
When the picture of the zebra generated by the generator of the model one can cheat the discriminator
Figure 534205DEST_PATH_IMAGE030
The horse picture generated can be deceived
Figure 895916DEST_PATH_IMAGE022
Then, the discriminator at this time
Figure 14045DEST_PATH_IMAGE030
Is an identifier capable of identifying whether the picture is mature or not. And a discriminator capable of judging whether the picture is the zebra needs to be trained in the submodel of the second model.In order to accelerate the convergence speed of the neutron model in the model two, the antagonism in the model one is lost
Figure 410391DEST_PATH_IMAGE036
A weighting factor is added and added to the full objective function of the submodel. In the training process, the weight factor is set to 0.01. similarly, when converting zebra to horse, add to the full objective function
Figure 955773DEST_PATH_IMAGE037
The network structure of the Pix2Pix model is shown in fig. 3, wherein X represents a zebra picture generated in the first model, and Real _ y represents a Real horse picture corresponding to X. The generator G part uses a U-Net structure, and the discriminator D part uses a PatchGAN structure. Ck denotes the Convolition-BatchNorm-ReLU layer with k filters. CDk denotes k volume-BatchNorm-Dropout-ReLU layers, and the Dropout ratio is 50%. All convolutions are 4 x 4 spatial filters with an application step of 2. The convolutions in the encoder and discriminator are downsampled by a factor of 2, while they are upsampled by a factor of 2 in the decoder. The generator architecture is as follows: the decoder is C64-C128-C256-C512-C512-C512, and the U-Net decoder is a CD512-CD1024-CD 1024-C1024-C1024-C512-C256-C128.70 x 70 discriminator, and the structure is as follows: C64-C128-C256-C512. the network structure is the better performing network structure proposed in Pix2Pix paper "P. Isola, J. Zhu, T. Zhou and A.A. Efron," Image-to-Image transformation with Conditional adaptive Networks, "2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 5967 19 + 5976, doi: 10.1109/CVPR.2017.632.
In the specific implementation of step S12, two sets of unpaired data sets with the same data distribution inside are used as inputs of the improved first model, the improved first model is trained, and the improved first submodel and the improved second submodel are respectively trained by two pairs of paired data sets output after the training of the improved first model is completed;
specifically, as shown in fig. 4, the process of training the improved first model includes:
step S21: two groups of unpaired data sets X and Y with internal data in the same distribution are constructed and are used as the input of the improved first model to learn mapping X → Y and mapping Y → X;
specifically, taking the generation of an image of a horse converted into a zebra in an image deformation task as an example, two sets of datasets of the horse and the zebra are downloaded from an ImageNet website. The data set for the horse contained 939 pictures and the data set for the zebra contained 1177 images. All picture data is scaled so that the resolution of each picture is 256 × 256.
Step S22: optimizing the improved first model through the antagonism loss, the identity mapping loss and the improved cycle consistency loss, suspending the training of the model when the full target loss is reached and the fluctuation is within a preset range or the training frequency reaches a preset threshold value, and outputting a first false picture set paired with the unpaired data set X
Figure 70360DEST_PATH_IMAGE001
And a second set of false pictures B paired with the unpaired data set Y.
Specifically, an antagonism loss, an improved cycle consistency loss, an identity mapping loss are calculated. And optimizing the full objective function. An Adam solver is applied to calculate the gradient and update the network parameters. To ensure that the model is more stable during training, the generator is trained using least squares penalties instead of negative log-likelihood targets. I.e. training the generator, minimize:
Figure 726600DEST_PATH_IMAGE038
minimization of training of discriminators
Figure 977453DEST_PATH_IMAGE039
To reduce model oscillation, the discriminator is updated with historical data of the generated images, and a buffer of images is reserved for storing 60 previously generated images during training.
When the improved first sub-model converges, two false pictures are output. One is horse data centralization horse
Figure 818370DEST_PATH_IMAGE003
Image of the zebra generated by the image of (2), and is recorded as
Figure 794154DEST_PATH_IMAGE040
The other is zebra data set
Figure 378719DEST_PATH_IMAGE041
Image of horse, noted
Figure 625024DEST_PATH_IMAGE042
Specifically, training the improved second model with the two pairs of paired data sets output after the training of the improved first model is completed, as shown in fig. 5, includes:
step S31: taking a pairing data set (X, A) as an input of a second generator of a first submodel in the second model, so that the second generator learns the mapping X → A, updating a second discriminator of the first submodel through a modified full objective function, and stopping training until a generation result of the second generator can cheat the second discriminator;
in a specific implementation, Dropout is added to the generator part to increase randomness during model training, and Adam updater is used to perform gradient solving and update network parameters. The learning rate remains unchanged for the first 100 epochs and then decays to 0 in a linear trend. And when the false picture generated by the first submodel Pix2Pix can deceive the discriminator of the first submodel Pix2Pix, the first submodel Pix2Pix model converges, and the training is finished. In this step, to prevent the network structure from being too complex, the user may reduce the network of the Pix2Pix generator part, e.g. 2-3 blocks may be used. The first sub-model Pix2Pix generator will output a new zebra picture, which is recorded as
Figure 636842DEST_PATH_IMAGE043
I.e. the conversion of the horse into the final result of the zebra image generation task.
Step S32: taking the paired dataset (B, Y) as input of a third generator of a second submodel in the second model, so that the third generator learns the mapping B → Y, updating a third discriminator of the second submodel by the improved full objective function, and stopping training until the generation result of the third generator can cheat the third discriminator.
The specific implementation of step S32 is the same as the specific implementation of step S31, and is not described herein again.
In a specific implementation of step S13, obtaining an unpaired data set;
specifically, taking the generation of an image of a horse converted into a zebra in an image deformation task as an example, two sets of datasets of the horse and the zebra are downloaded from an ImageNet website. The data set of the horse comprises 939 pictures, and the data set of the zebra comprises 1177 pictures. All picture data is scaled so that the resolution of each picture is 256 × 256. A set of unpaired datasets (horses, zebras) was constructed. The data set uses the public data set on the ImageNet website, so that on one hand, the cost of data collection by the data set is saved, on the other hand, the public data set has higher quality, other similar models also use the data set for training, and after the data sets are unified, the training effects of a plurality of models are easier to compare.
In a specific implementation of step S14, after the unpaired data set is input into the trained first model, a first generated data set and a second generated data set generated by the first model are obtained;
specifically, the data sets of the horse and the zebra are input into an improved first model, and two false pictures are output after the training of the first model converges. One is to centralize the horse by the horse data
Figure 335808DEST_PATH_IMAGE003
Image of the zebra generated by the image of (2), and is recorded as
Figure 458485DEST_PATH_IMAGE040
The other is zebra data set
Figure 559296DEST_PATH_IMAGE041
Image of horse, noted
Figure 742015DEST_PATH_IMAGE042
. Finally, two sets of paired data sets are constructed, i.e.
Figure 695321DEST_PATH_IMAGE044
And
Figure 621689DEST_PATH_IMAGE045
preparing an input data set for training the second model. Since the first model is a generated picture trained based on an unpaired dataset. Current research shows that, compared with a model trained using a paired data set, a generation method based on an unpaired image has insurmountable differences in the authenticity, image quality, and the like of a generated image of the model from an image generated by training the model using the paired data set. Thus, the present invention combines the picture generated by the first model and the real picture into a paired data set, which is used as an input for the second model. And further correcting the generated picture to narrow the difference of the description.
In a specific implementation of step S15, the first and second generated data sets are respectively input into the trained first and second submodels, and the third and fourth generated data sets generated by the first and second submodels are used as final generated results.
In a specific implementation, the data sets generated in the first model are input to the first and second submodels of the modified second model, respectively. Data set
Figure 436061DEST_PATH_IMAGE044
Inputting the data set into the first submodel of the improved second model
Figure 930628DEST_PATH_IMAGE045
And inputting the data into a second sub-model of the improved second model. The purpose of this step is to separately pair
Figure 728819DEST_PATH_IMAGE040
And
Figure 334244DEST_PATH_IMAGE042
and (6) correcting. When in use
Figure 268702DEST_PATH_IMAGE040
And
Figure 934170DEST_PATH_IMAGE042
when the discriminators of the first sub-model and the second sub-model can be deceived respectively, the generators of the first sub-model and the second sub-model respectively output final results X _ result and Y _ result, wherein X _ result represents a zebra picture corresponding to X, and Y _ result represents a horse picture corresponding to Y.
The benefit of this design is that since the first model can output two datasets, one is the zebra dataset corresponding to the horse dataset X
Figure 219658DEST_PATH_IMAGE046
And the other is a horse data set corresponding to the zebra data set Y
Figure 861729DEST_PATH_IMAGE047
The invention designs the structure of two sub-models in part of the second model for the purpose of modifying the two generated data sets. This makes the joint model more complete. The whole model can realize the output of pictures with two styles. I.e. the model will output a set of pictures of both horse and zebra styles.
In correspondence with the aforementioned embodiments of the image generation method for unpaired datasets, the present application also provides embodiments of an image generation apparatus for unpaired datasets.
FIG. 6 is a block diagram illustrating an image generation apparatus for unpaired datasets in accordance with an exemplary embodiment. Referring to fig. 6, the apparatus may include:
an improvement module 21, configured to improve a first model and a second model, where the second model includes a first sub-model and a second sub-model;
the training module 22 is configured to train the improved first model by using two sets of unpaired data sets with the same data distribution inside as inputs of the improved first model, and train the improved first submodel and the second submodel respectively through two pairs of paired data sets output after the training of the improved first model is completed;
the obtaining module 23 is configured to obtain a data set to be paired;
a first input module 24, configured to input the data set to be paired into the first model, and obtain a first generated data set generated by a first generator of the first model;
and a second input module 25, configured to input the first generated data set into the improved second model, and use the second generated data set generated by the second model as a final generated result.
With regard to the apparatus in the above embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be described in detail here.
For the device embodiment, since it basically corresponds to the method embodiment, reference may be made to the partial description of the method embodiment for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.
Correspondingly, the present application further provides an electronic device, comprising: one or more processors; a memory for storing one or more programs; when executed by the one or more processors, cause the one or more processors to implement an image generation method for unpaired datasets as described above. As shown in fig. 7, for a hardware structure diagram of any device with data processing capability where an image generation method for an unpaired data set is located according to an embodiment of the present invention, in addition to the processor, the memory, and the network interface shown in fig. 7, any device with data processing capability where an apparatus is located in an embodiment may also include other hardware according to an actual function of the any device with data processing capability, which is not described again.
Accordingly, the present application also provides a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the image generation method for unpaired datasets as described above. The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any data processing capability device described in any of the foregoing embodiments. The computer readable storage medium may also be an external storage device of the wind turbine, such as a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), and the like, provided on the device. Further, the computer readable storage medium may include both an internal storage unit of any data processing capable device and an external storage device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing-capable device, and may also be used for temporarily storing data that has been output or is to be output.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof.

Claims (6)

1. An image generation method for unpaired datasets, comprising:
improving a first model and a second model, wherein the second model comprises a first submodel and a second submodel, the first model is a generation model aiming at an unpaired image data set and is selected from CycleGAN, CoGAN, StarGAN and UNIT, the first submodel and the second submodel in the second model are generation models aiming at a paired image data set and are selected from Pix2Pix, DiscogAN, DRAGAN, DualGAN, BicycleGAN, BiGAN and SimGAN;
taking two groups of unpaired image data sets with the same data distribution in the interior as the input of an improved first model, training the improved first model, and respectively training an improved first sub-model and an improved second sub-model through two groups of matched image data sets output after the training of the improved first model is finished, wherein the training of the first model is suspended when the loss of a full target is reached, the loss fluctuates in a preset range or the training times reach a preset threshold value; stopping training of the first submodel when a result of the second generator of the first submodel may spoof a second discriminator of the first submodel; stopping training of the second submodel when a result generated by the third generator of the second submodel can cheat the third discriminator of the second submodel;
acquiring an unpaired image dataset;
inputting the unpaired image data set into a trained first model to obtain a first generated image data set and a second generated image data set generated by the first model;
inputting the first generated image data set and the second generated image data set into a trained first sub-model and a trained second sub-model respectively, and taking a third generated image data set and a fourth generated image data set generated by the first sub-model and the second sub-model as final generation results;
wherein, improving the first model comprises: modifying the cyclic consistency loss in the first model into a segment loss to remove the inclusion term in the cyclic consistency loss;
the segmentation penalty is, for two sets of unpaired image datasets X and Y with the same distribution of internal data:
using a forward cycle consistency loss when the first model is learning mapping X → Y;
using a reverse cycle consistency loss when the first model is learning mapping Y → X;
wherein, improving the second model comprises:
and multiplying the corresponding resistance loss in the improved first model by a preset weight factor, and respectively adding the product into the full objective functions of the first submodel and the second submodel.
2. The method of claim 1, wherein training the improved first model using two sets of unpaired image data sets with the same data distribution inside as inputs to the improved first model comprises:
two groups of unpaired image data sets X and Y with internal data in the same distribution are constructed and used as the input of the improved first model to learn mapping X → Y and mapping Y → X;
optimizing the improved first model by means of the antagonism loss, the identity mapping loss and the improved cycle consistency loss, suspending the training of the model when the full target loss is reached and the fluctuation is within a predetermined range or the training times reach a predetermined threshold, outputting a first false image set paired with the unpaired image data set X
Figure 566067DEST_PATH_IMAGE001
And a second set of false pictures B paired with the unpaired image data set Y.
3. The method of claim 1, wherein training the modified second model with the two pairs of paired image data sets output after the training of the modified first model is completed comprises:
taking a paired image dataset (X, A) as an input of a second generator of a first submodel in the second model, so that the second generator learns the mapping X → A, updating a second discriminator of the first submodel by a modified full objective function, and stopping training until a generation result of the second generator can cheat the second discriminator;
-taking a set of paired image data (B, Y) as input to a third generator of a second submodel in the second model, such that the third generator learns the mapping B → Y, -updating a third discriminator of the second submodel by a refined full objective function, and-stopping the training until the third discriminator can be fooled by the generation result of the third generator.
4. An image generation apparatus for unpaired datasets, comprising:
the improvement module is used for improving a first model and a second model, wherein the second model comprises a first submodel and a second submodel, the first model is a generation model aiming at an unpaired image data set and is selected from CycleGAN, CoGAN, StarGAN and UNIT, the first submodel and the second submodel in the second model are generation models aiming at a paired image data set and are selected from Pix2Pix, discoGAN, DRAGAN, DualGAN, BicycleGAN, BiGAN and SimGAN;
the training module is used for taking two groups of unpaired image data sets with the same data distribution in the interior as the input of the improved first model, training the improved first model, and respectively training the improved first sub-model and the improved second sub-model through two pairs of matched image data sets output after the training of the improved first model is finished, wherein the training of the first model is suspended when the full target loss is reached and fluctuates within a preset range or the training frequency reaches a preset threshold value; stopping training of the first submodel when a second identifier of the first submodel can be cheated by a generated result of a second generator of the first submodel; stopping training of the second submodel when a generation result of the third generator of the second submodel can cheat a third discriminator of the second submodel;
the acquisition module is used for acquiring an image data set to be paired;
the first input module is used for inputting the image data set to be paired into the first model to obtain a first generated image data set generated by a first generator of the first model;
a second input module, configured to input the first generated image data set into an improved second model, and use a second generated image data set generated by the second model as a final generation result;
wherein, improving the first model comprises: modifying the cyclic consistency loss in the first model into a segment loss to remove the inclusion term in the cyclic consistency loss;
the segmentation penalty is, for two sets of unpaired image datasets X and Y with the same distribution of internal data:
using a forward cycle consistency loss when the first model is learning mapping X → Y;
using a reverse cycle consistency loss when the first model is learning mapping Y → X;
wherein, improving the second model comprises:
and multiplying the corresponding resistance loss in the improved first model by a preset weight factor, and respectively adding the product into the full objective functions of the first submodel and the second submodel.
5. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the image generation method for unpaired datasets of any one of claims 1-3.
6. A computer readable storage medium having stored thereon computer instructions, which when executed by a processor, carry out the steps of the image generation method for unpaired datasets as claimed in any one of claims 1 to 3.
CN202210661703.3A 2022-06-13 2022-06-13 Image generation method and device for unpaired data set Active CN114758035B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210661703.3A CN114758035B (en) 2022-06-13 2022-06-13 Image generation method and device for unpaired data set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210661703.3A CN114758035B (en) 2022-06-13 2022-06-13 Image generation method and device for unpaired data set

Publications (2)

Publication Number Publication Date
CN114758035A CN114758035A (en) 2022-07-15
CN114758035B true CN114758035B (en) 2022-09-27

Family

ID=82336449

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210661703.3A Active CN114758035B (en) 2022-06-13 2022-06-13 Image generation method and device for unpaired data set

Country Status (1)

Country Link
CN (1) CN114758035B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112529949A (en) * 2020-12-08 2021-03-19 北京安德医智科技有限公司 Method and system for generating DWI image based on T2 image
CN114331821A (en) * 2021-12-29 2022-04-12 中国人民解放军火箭军工程大学 Image conversion method and system

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10643320B2 (en) * 2017-11-15 2020-05-05 Toyota Research Institute, Inc. Adversarial learning of photorealistic post-processing of simulation with privileged information
CN108615073B (en) * 2018-04-28 2020-11-03 京东数字科技控股有限公司 Image processing method and device, computer readable storage medium and electronic device
CN109064389B (en) * 2018-08-01 2023-04-18 福州大学 Deep learning method for generating realistic images by hand-drawn line drawings
US11048974B2 (en) * 2019-05-06 2021-06-29 Agora Lab, Inc. Effective structure keeping for generative adversarial networks for single image super resolution
WO2021092686A1 (en) * 2019-11-15 2021-05-20 Modiface Inc. Image-to-image translation using unpaired data for supervised learning
EP4060572A4 (en) * 2019-12-26 2023-07-19 Telefónica, S.A. Computer-implemented method for accelerating convergence in the training of generative adversarial networks (gan) to generate synthetic network traffic, and computer programs of same
CN112488243A (en) * 2020-12-18 2021-03-12 北京享云智汇科技有限公司 Image translation method
KR102288759B1 (en) * 2021-03-26 2021-08-11 인하대학교 산학협력단 Method and Apparatus for Construction of Controllable Image Dataset in Generative Adversarial Networks
CN113643400B (en) * 2021-08-23 2022-05-24 哈尔滨工业大学(威海) Image generation method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112529949A (en) * 2020-12-08 2021-03-19 北京安德医智科技有限公司 Method and system for generating DWI image based on T2 image
CN114331821A (en) * 2021-12-29 2022-04-12 中国人民解放军火箭军工程大学 Image conversion method and system

Also Published As

Publication number Publication date
CN114758035A (en) 2022-07-15

Similar Documents

Publication Publication Date Title
CN111199550B (en) Training method, segmentation method, device and storage medium of image segmentation network
WO2021254499A1 (en) Editing model generation method and apparatus, face image editing method and apparatus, device, and medium
CN108875935B (en) Natural image target material visual characteristic mapping method based on generation countermeasure network
CN112507617B (en) Training method of SRFlow super-resolution model and face recognition method
WO2022078041A1 (en) Occlusion detection model training method and facial image beautification method
WO2022160657A1 (en) High-definition face swap video generation method and system
CN110288513B (en) Method, apparatus, device and storage medium for changing face attribute
CN112861659B (en) Image model training method and device, electronic equipment and storage medium
CN109409201A (en) A kind of pedestrian's recognition methods again based on shared and peculiar dictionary to combination learning
Iskakov Semi-parametric image inpainting
CN111898571A (en) Action recognition system and method
Dong et al. Self-supervised colorization towards monochrome-color camera systems using cycle CNN
Liu et al. Facial image inpainting using multi-level generative network
Huang et al. Steganography embedding cost learning with generative multi-adversarial network
CN112541566B (en) Image translation method based on reconstruction loss
CN114758035B (en) Image generation method and device for unpaired data set
CN109598201B (en) Action detection method and device, electronic equipment and readable storage medium
CN116129417A (en) Digital instrument reading detection method based on low-quality image
Teng et al. Unimodal face classification with multimodal training
CN113344792B (en) Image generation method and device and electronic equipment
CN116546304A (en) Parameter configuration method, device, equipment, storage medium and product
CN116977794B (en) Digital human video identification model training method and system based on reinforcement learning
KR102678473B1 (en) Automatic Caricature Generating Method and Apparatus
CN115938546B (en) Early gastric cancer image synthesis method, system, equipment and storage medium
JP6674393B2 (en) Feature amount registration device, method and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant