CN112434798A

CN112434798A - Multi-scale image translation method based on semi-supervised learning

Info

Publication number: CN112434798A
Application number: CN202011500040.4A
Authority: CN
Inventors: 冷勇
Original assignee: Beijing Xiangyun Zhihui Technology Co ltd
Current assignee: Beijing Xiangyun Zhihui Technology Co ltd
Priority date: 2020-12-18
Filing date: 2020-12-18
Publication date: 2021-03-02

Abstract

The invention discloses a multi-scale image translation method based on semi-supervised learning, wherein a counternetwork is generated by multi-scale discrimination, all image translation tasks are simultaneously completed by using a uniform frame, and a plurality of scales of the generated image are simultaneously discriminated, so that a plurality of unreasonable forged objects are prevented from locally appearing in the generated image; the method combines generation of a countermeasure network and dual learning to train the model by utilizing the unpaired data, improves the effect of semi-supervised image translation, effectively utilizes the unpaired image information to improve the performance of the model, and reduces the requirement of model training on paired images. The multi-scale image translation method based on semi-supervised learning effectively overcomes the defect that a supervised image translation algorithm needs a large amount of training data, and meanwhile can accelerate model convergence and improve the performance of the model. The invention has obvious effect and is suitable for wide popularization.

Description

Multi-scale image translation method based on semi-supervised learning

Technical Field

The invention relates to the technical field of image translation, in particular to a multi-scale image translation method based on semi-supervised learning.

Background

Image translation refers to the task of automatically transforming one representation of a scene of an image to another. At present, a convolutional neural network is used as a model for image translation, but for model construction of specific tasks, loss function design and optimization strategies are not the same, and burden of model design is greatly increased.

The model outputs a blurred image by minimizing the euclidean distance between the model prediction and the real label. Supervised learning based image translation algorithms require a large amount of paired input-output training data. The existing image translation task is based on a supervised learning model, a large amount of paired training data is needed, and in reality, the paired data is difficult and expensive to obtain.

Aiming at the problems, the method is designed for solving the problems that models of different tasks are complex to construct, the model design burden is heavy, and the data size required by model training and learning is large, so that the image translation with high efficiency and high precision is realized.

Disclosure of Invention

Aiming at the defects, the technical problem to be solved by the invention is to provide a multi-scale image translation method based on semi-supervised learning, so as to solve the problems of complex construction of different task models, heavy model design burden and large data volume required by model training and learning in the prior art.

The invention provides a multi-scale image translation method based on semi-supervised learning, which comprises the following specific steps of:

generating a confrontation network based on multiple scales, and supervising and training a model according to paired data;

and generating discrimination loss of the countermeasure network based on the cycle consistency and multi-scale of the dual learning, and obtaining a high-performance image translation model according to the unpaired data unsupervised training model.

Preferably, the generating the countermeasure network based on multiple scales and the training model based on paired data supervision specifically includes:

acquiring a data set x consisting of pairs of images_i,y_iIn which x_i∈X,y_iE.g. Y, X and Y being two associated image domains;

simultaneously supervising and training two dual convolution network models G and F according to paired image data, wherein a discriminator corresponding to the convolution network model G is D_XThe discriminator corresponding to the convolution network model F is D_Y；

Optimizing the model based on the L1 distance of the minimized output from the target;

and based on the dual two convolutional network models G and F, obtaining a model with the output of the generator consistent with the target domain through a discriminator of the multi-scale generation antagonistic network.

Preferably, the training based on the discrimination loss of the multi-scale generation countermeasure network specifically includes:

discriminating images of different scales by a plurality of discriminators D, each discriminator_iHas a loss function of

Where x and y are each image data paired in a dataset, G (x) is a sample generated by the generator that is as far as possible subject to the distribution of the true data x, D_i(y) is a discriminator D_iFrom the classification probability, D, derived from y_i(G (x)) is a discriminator D_iThe classification probability is obtained according to G (x);

the method of taking an average value is adopted to synthesize the gradients of the loss functions of all the discriminators to obtain the total loss function of the discriminators

The generator iterates the model based on the gradients of the loss functions of all the discriminators, and the loss function of the generator D is obtained as follows: lambda [ alpha ]_D(G,D,x,y)＝-∑logD_i(G(x))+||G(x)-y||₁；

And alternately training the discriminator and the generator according to the steps to obtain a multi-scale generation confrontation network model.

Preferably, the two convolution network models G and F based on dual obtain a loss function of its discriminator through multi-scale generative confrontation network training as:

the loss function of the generator is:

λ_G(G,D_Y,x,y)＝-logD_Y(G(x))+||G(x)-y||₁，

λ_F(F,D_X,y,x)＝-logD_X(F(y))+||F(y)-x||₁wherein D is_X() Is a discriminator D_XDerived classification probability, D_Y() Is a discriminator D_YAnd G () is a sample generated by the generator and subjected to the distribution of the real data x as much as possible, and F () is a conversion relation between the original image and the generator sample.

Preferably, the discrimination loss of the countermeasure network is generated based on the cycle consistency and the multi-scale of the dual learning, and the unsupervised training model according to the unpaired data comprises the following specific steps:

acquiring two unpaired data sets X and Y;

two convolution network models G and F obtained based on supervised training according to unpaired data x_j,y_jUnsupervised training yielded two translation models G that satisfied cycle consistency: x → y and F: y → x, wherein x_j∈X,y_j∈Y；

Two confrontation discriminators D for generating the confrontation network through multi-scale while obtaining two translation models through training_xAnd D_yJudging the difference between the generated data and the target domain real data;

and fitting the output distribution of the model with the distribution of the target domain image according to the difference training model to obtain a generator which outputs a sample consistent with the target domain.

Preferably, the cyclic consistency satisfied by the two translation models is in particular: for any X ∈ X: x → G (x) → F (G (x)) ≈ x, and for an arbitrary Y ∈ Y: y → F (y) → G (F (y) ≈ y).

Preferably, the objective function of the discriminator of the high-performance image translation model is as follows:

the penalty function of the generator is:

λ_G(G,D_Y,x,y)＝-logD_Y(G(x))，λ_F(F,D_X,y,x)＝-logD_X(F (y)) wherein D_X() Is a discriminator D_XDerived classification probability, D_Y() Is a discriminator D_YAnd G () is a sample generated by the generator and subjected to the distribution of the real data x as much as possible, and F () is a conversion relation between the original image and the generator sample.

Preferably, the cycle consistency loss function of the high-performance image translation model is as follows:

λ_cons(G,F,x,y)＝||F(G(x))-x||₁+||G(F(y))-y||₁；

optimizing the antagonistic loss and the cyclic consistent loss function to train the model, and obtaining a total loss function of the generator as follows: l_unpaired(G,F,x,y)＝l_G(G,D_Y,x,y)+l_F(G,D_Y,y,x)+λl_cons(G, F, x, y), where λ is the hyperparameter, controls the ratio of the two losses.

According to the scheme, compared with the prior art, the multiscale image translation method based on semi-supervised learning has the advantages that a confrontation network is generated through multiscale discrimination, all image translation tasks are simultaneously completed through a unified framework, the model is simple to construct, the model design burden is light, and a plurality of unreasonable forged objects are prevented from locally appearing in the generated image through simultaneous discrimination of a plurality of scales of the generated image; the method combines generation of a countermeasure network and dual learning to train the model by utilizing the unpaired data, improves the effect of semi-supervised image translation, effectively utilizes the unpaired image information to improve the performance of the model, and reduces the requirement of model training on paired images. The method effectively overcomes the defect that a supervision image translation algorithm needs a large amount of training data, can accelerate model convergence, improves the performance of the model, solves the problems of complex construction of different task models, heavy model design burden and large data amount required by model training and learning in the prior art, has obvious effect and is suitable for wide popularization.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a process block diagram i of a multi-scale image translation method based on semi-supervised learning according to an embodiment of the present invention;

fig. 2 is a process block diagram of a multi-scale image translation method based on semi-supervised learning according to an embodiment of the present invention;

fig. 3 is a block diagram of a process of a multi-scale image translation method based on semi-supervised learning according to an embodiment of the present invention;

FIG. 4 is a block diagram of a process of training two models through supervised learning in a multi-scale image translation method based on semi-supervised learning according to an embodiment of the present invention;

FIG. 5 is a block diagram of a process of improving performance through dual learning of a multi-scale image translation method based on semi-supervised learning according to an embodiment of the present invention;

FIG. 6 is a comparison diagram obtained by translating two images of 3 face images made up by different people;

fig. 7 is a comparison diagram obtained by translating two kinds of images of face images of 3 different makeup styles corresponding to the same person.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1 to 5, a multi-scale image translation method based on semi-supervised learning according to an embodiment of the present invention will be described. The multi-scale image translation method based on semi-supervised learning is a semi-supervised image translation method based on L1 loss, cycle consistency of dual learning and countervailing loss of a countervailing network in a multi-scale mode. In the supervised training phase, training the model by optimizing the L1 loss and discriminating the loss; in the unsupervised learning phase, the model is trained based on the cycle consistency of dual learning and through multi-scale generation of confrontation losses of the confrontation network. The method comprises the following specific steps:

s1, generating a confrontation network based on multiple scales, and supervising and training the model according to paired data;

the generation countermeasure network is composed of a generator and a discriminator, the generator models the potential distribution of data and generates new samples; the discriminator discriminates the difference between the real sample and the generated sample. The generation countermeasure network is a pair of network, a generator network and a discriminator network. Semi-supervised training of a model involving two data sets including a paired data set { x_i,y_iAnd an unpaired data set consisting of an X-domain image { X }_jAnd Y field image { Y }_jAnd (4) two parts.

The specific implementation steps of the step can be as follows:

s1.1, acquiring a data set x consisting of paired images_i,y_iIn which x_i∈X,y_iE.g. Y, X and Y being two associated image domains;

s1.2, simultaneously supervising and training two dual convolution network models G and F according to paired image data, wherein a discriminator corresponding to the convolution network model G is D_XThe discriminator corresponding to the convolution network model F is D_Y；

D representing the discriminator is a differentiable function whose inputs are the real data x and the random variable z, respectively. Differentiable function g (z) is a sample generated by the generator that is as compliant as possible with the true data distribution. The discriminator can be a multi-scale discriminator with shared weight, the generator and the discriminator both use a full convolution network, and the generator architecture can adopt a U-net structure, so that the information of the bottom layer of the input image can be utilized in the image translation process. In the full convolution discrimination network, different layers represent the activation values of images with different scales, the activation value of the bottom layer represents the activation value of a small image block, and the activation value of the upper layer represents the activation value of the whole image. So that a plurality of discriminators are represented using one network, and discrimination of multi-scale images is performed simultaneously.

The arbiter for generating the countermeasure network being a multi-classification network, D_iAre the classification probabilities of the different classes.

The input network F transforms the original image to a feature space, the domain mapping function F: x → Y such that F (X)_i)≈y_iAnd the output network G maps the features to the target domain image, and a domain mapping function G is obtained according to the function F: x → Y such that F (X) to F (G (X)).

S1.3, optimizing a model based on the L1 distance between the minimized output and the target;

s1.4, based on the dual two convolution network models G and F, obtaining a model with the output of the generator consistent with the target domain through a discriminator of a multi-scale generation countermeasure network.

In a multiscale generative confrontation network, different scales correspond to different discriminators, each discriminator D_iHas a loss function of

Overall penalty function of arbiter:

the loss function of the generator is: lambda [ alpha ]_D(G,D,x,y)＝-∑logD_i(G(x))+||G(x)-y||₁. For different convolutional layers, feature maps of different scales are obtained. And the activation values of different feature layers represent feature values obtained after the image blocks with different scales are subjected to feature extraction. The feature maps of different layers can be transformed to the through-scale feature map with the channel number of 1 by convolution of 1 × 1, and then the probability of the output of the discriminator is obtained through a function.

The network may be trained by alternately training the arbiter D and the generator G. A batch random gradient descent and adam optimizer were used. The discriminators are trained using images generated by a past generator such that the discriminators have memory of samples generated in the past, thereby mitigating the pattern collapse problem of generating an anti-net. The learning process of generating the countermeasure network is the process of continuous game and iterative optimization of the generator and the discriminator. The specific process can be as follows: fixing a generator G, and optimizing a discriminator D to maximize the discrimination accuracy of D; and then fixing a discriminator D, optimizing a generator G, minimizing the discrimination accuracy of the discriminator D, and finally solving a global optimal solution, wherein the training of the discriminator D is a process of minimizing cross entropy. The model captures high frequency components of the image by opposing losses, and the discriminator that generates the opposing network using multi-scale discrimination forces the output of the model to coincide with the target image.

And S2, obtaining a high-performance image translation model according to the unpaired data unsupervised learning training model based on the cycle consistency of the dual learning and the discrimination loss of the generated countermeasure network.

In a multi-scale generation countermeasure network, images of different scales are discriminated by a plurality of discriminators. The generator iterates the model based on the gradients of the losses of all the discriminators, and integrates the gradients of the loss functions of all the discriminators by adopting an averaging method. The specific implementation steps can be as follows:

The generator iterates the model based on the gradients of all the discriminator-loss functions,the loss function for generator D is found as: lambda [ alpha ]_D(G,D,x,y)＝-∑logD_i(G(x))+||G(x)-y||₁；

Based on the dual two convolution network models G and F, the loss function of the discriminator is obtained by multi-scale generation of the confrontation network training:

the loss function of the generator is:

λ_G(G,D_Y,x,y)＝-logD_Y(G(x))+||G(x)-y||₁，

The specific implementation steps of the step can be as follows:

s2.1, acquiring two unpaired data sets X and Y and two image data sets { X }_jAnd { y }_j}；

S2.2, two convolution network models G and F obtained based on supervised training are obtained according to unpaired data x_j,y_jUnsupervised training yielded two translation models G that satisfied cycle consistency: x → y and F: y → x, wherein x_j∈X,y_jE is Y, G and F form a mapping closed loop;

two translation models G: x → y and F: y → X can form a forward cycle X → Y → X and a reverse cycle Y → X → Y, with the resulting output for each cycle being identical to the input, satisfying cycle consistency. Wherein for any X ∈ X: x → G (x) → F (G (x)) ≈ x, and for an arbitrary Y ∈ Y: y → F (y) → G (F (y) ≈ y).

The performance of the model is improved by utilizing unpaired data based on the cycle consistency of the dual learning model, and the obtained objective function is as follows: lambda [ alpha ]_cons(G,F,x,y)＝||F(G(x))-x||₁+||G(F(y))-y||₁。

S2.3, training to obtain two translation models, and generating two confrontation discriminators D of the confrontation network through multiple scales_xAnd D_yJudging the difference between the generated data and the target domain real data;

in a multiscale generative countermeasure network, different scales correspond to different discriminators, each discriminator having a loss function of

Overall penalty function of arbiter:

the resulting loss function is: lambda [ alpha ]_D(G,D,x,y)＝-∑logD_i(G(x))。

And S2.4, fitting the output distribution of the model with the distribution of the target domain image according to the difference training model to obtain a generator which outputs samples consistent with the target domain.

And forcing the output distribution of the model to be fitted to the distribution of the target domain image by using a discriminator for generating the countermeasure network, thereby improving the performance of the model by using the unpaired data.

The target function of the discriminator of the high-performance image translation model is as follows:

the penalty function of the generator is:

λ_G(G,D_Y,x,y)＝-logD_Y(G(x))，λ_F(F,D_X,y,x)＝-logD_X(F (y)) wherein D_X() To judge

Pin D_XDerived classification probability, D_Y() Is a discriminator D_YAnd G () is a sample generated by the generator and subjected to the distribution of the real data x as much as possible, and F () is a conversion relation between the original image and the generator sample.

The cycle consistency loss function is: lambda [ alpha ]_cons(G,F,x,y)＝||F(G(x))-x||₁+||G(F(y))-y||₁；

And simultaneously optimizing the functions of the immunity loss and the cyclic consistent loss to train the model, and obtaining the total loss function of the generator as follows: l_unpaired(G,F,x,y)＝l_G(G,D_Y,x,y)+l_F(G,D_Y,y,x)+λl_cons(G, F, x, y), where λ is the hyperparameter, controls the ratio of the two losses.

The multi-scale image translation method based on semi-supervised learning firstly trains a model by using a small amount of paired data; compared with the prior art, the multi-scale discrimination generation countermeasure network uses a uniform frame to simultaneously complete all image translation tasks, and simultaneously discriminates multiple scales of the generated image, so that the generated image is prevented from locally appearing many unreasonable forged objects.

The countermeasure network is generated based on multi-scale discrimination to carry out supervision training on the model, the generation effect of the originally generated countermeasure network is improved, the model can not only model the overall structure outline of the image, but also model the local details of the image, and the generated image details are more reasonable. The game of the discriminator and the generator forces the generator to approach to the real sample distribution, meanwhile, the non-paired data is utilized to promote the model, and the requirement of an image translation task on paired images is reduced.

The model is subjected to unsupervised learning training through the cycle consistency based on dual learning and the discrimination loss generated by a countermeasure network, the performance of the model is improved by effectively utilizing the information of the unpaired images, and the requirements of the model training on the paired images are reduced. A countermeasure network is generated through multi-scale discrimination, the unpaired image data is applied to model training, samples generated by a generator are forced to approach a target image by utilizing countermeasure loss, the requirement on the paired samples is reduced, and the image translation effect is improved; and the generation of a countermeasure network and dual learning are combined to utilize an unpaired data training model, so that the effect of semi-supervised image translation is improved. The method effectively overcomes the defect that a large amount of training data is needed for the supervised image translation algorithm, and meanwhile, the model convergence can be accelerated, and the model performance is improved.

The following are exemplary: an image translation method is used in a face makeup removal task, 100 original face images are collected from a network, makeup software is used for generating the face images after makeup, a data set containing 800 original face-makeup face image pairs is obtained, and the data set is applied to a training and testing model. In the experiment, the data set was divided into 4 parts, where 20% paired data was selected for supervised training, 60% unpaired data for unsupervised training, 10% for validation, and 10% for testing the performance of the algorithm. The L1 distance between the model output in the test set and the corresponding real label is used to evaluate the performance of the algorithm.

The method comprises the specific implementation steps of setting an experimental group and a comparison group, wherein the experimental group uses the multi-scale image translation method based on semi-supervised learning, the comparison group uses an L1 loss function training model, 3 face images which are made up by different people are randomly extracted from a data set, then the extracted samples are respectively made up by using the models obtained after the training of the experimental group and the comparison group to restore the extracted samples to original face images, in addition, the face images which correspond to the same person and have different makeup styles are randomly selected from a test set, and the makeup removal task is also carried out to convert the extracted samples to the original images. Referring to fig. 6 to 7, the face image after makeup-original face image-face image obtained after makeup removal of experimental group-face image obtained after makeup removal of control group are shown from left to right in sequence, and the following experimental results are obtained through analysis:

the distance L1 in the face makeup removal task for the experimental group was 0.034, and the distance L1 in the face makeup removal task for the control group was 0.089. The smaller the L1 distance, the smaller the difference between the translated image and the real image, and the more real the translated image. By comparison, the conclusion is finally drawn: compared with a control group, the experimental group can well complete the makeup removing task of the face image, and only a small number of paired makeup-makeup removing images are needed.

The data show that the image translation method can reduce the requirement of the image translation task on paired images, and the obtained image translation result is better and has very obvious effect.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. Details which are not described in detail in the embodiments of the invention belong to the prior art which is known to the person skilled in the art.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A multi-scale image translation method based on semi-supervised learning is characterized by comprising the following specific steps:

2. The method for translating the multi-scale images based on the semi-supervised learning as recited in claim 1, wherein the generating of the countermeasure network based on the multi-scale comprises the following specific steps of training a model according to paired data supervision:

3. The method for translating the multi-scale images based on the semi-supervised learning as claimed in claim 2, wherein the training based on the discriminant loss of the multi-scale generation countermeasure network comprises the following specific steps:

The generator iterates the model based on the gradients of the loss functions of all the discriminators to obtain the loss function lambda of the generator D_D(G,D,x,y)＝-∑logD_i(G(x))+||G(x)-y||₁；

4. The method for multi-scale image translation based on semi-supervised learning as claimed in claim 3, wherein the two convolution network models G and F based on dual are obtained by multi-scale generation of a loss function of a discriminator of a confrontation network through training as

The loss function of the generator is

λ_G(G,D_Y,x,y)＝-logD_Y(G(x))+||G(x)-y||₁，

5. The method for multi-scale image translation based on semi-supervised learning as claimed in claim 4, wherein the discrimination loss of the antagonistic network generated based on the cycle consistency and multi-scale of the dual learning comprises the following specific steps according to an unpaired data unsupervised training model:

acquiring two unpaired data sets X and Y;

6. The multi-scale image translation method based on semi-supervised learning as recited in claim 5, wherein the cyclic consistency satisfied by the two translation models is specifically: for any X ∈ X: x → G (x) → F (G (x)) ≈ x, and for an arbitrary Y ∈ Y: y → F (y) → G (F (y) ≈ y).

7. The multi-scale image translation method based on semi-supervised learning as claimed in claim 6, wherein the objective function of the discriminator of the high-performance image translation model is as follows:

the penalty function of the generator is:

λ_G(G,D_Y,x,y)＝-logD_Y(G(x))，λ_F(F,D_X,y,x)＝-logD_X(F (y)) wherein D_X() Is a discriminator D_XDerived classification probability, D_Y() To judgePin D_YAnd G () is a sample generated by the generator and subjected to the distribution of the real data x as much as possible, and F () is a conversion relation between the original image and the generator sample.

8. The multi-scale image translation method based on semi-supervised learning of claim 7, wherein the cycle consistency loss function of the high-performance image translation model is as follows:

λ_cons(G,F,x,y)＝||F(G(x))-x||₁+||G(F(y))-y||₁；