CN112434798A - Multi-scale image translation method based on semi-supervised learning - Google Patents

Multi-scale image translation method based on semi-supervised learning Download PDF

Info

Publication number
CN112434798A
CN112434798A CN202011500040.4A CN202011500040A CN112434798A CN 112434798 A CN112434798 A CN 112434798A CN 202011500040 A CN202011500040 A CN 202011500040A CN 112434798 A CN112434798 A CN 112434798A
Authority
CN
China
Prior art keywords
model
scale
generator
image
discriminator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202011500040.4A
Other languages
Chinese (zh)
Inventor
冷勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiangyun Zhihui Technology Co ltd
Original Assignee
Beijing Xiangyun Zhihui Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiangyun Zhihui Technology Co ltd filed Critical Beijing Xiangyun Zhihui Technology Co ltd
Priority to CN202011500040.4A priority Critical patent/CN112434798A/en
Publication of CN112434798A publication Critical patent/CN112434798A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a multi-scale image translation method based on semi-supervised learning, wherein a counternetwork is generated by multi-scale discrimination, all image translation tasks are simultaneously completed by using a uniform frame, and a plurality of scales of the generated image are simultaneously discriminated, so that a plurality of unreasonable forged objects are prevented from locally appearing in the generated image; the method combines generation of a countermeasure network and dual learning to train the model by utilizing the unpaired data, improves the effect of semi-supervised image translation, effectively utilizes the unpaired image information to improve the performance of the model, and reduces the requirement of model training on paired images. The multi-scale image translation method based on semi-supervised learning effectively overcomes the defect that a supervised image translation algorithm needs a large amount of training data, and meanwhile can accelerate model convergence and improve the performance of the model. The invention has obvious effect and is suitable for wide popularization.

Description

Multi-scale image translation method based on semi-supervised learning
Technical Field
The invention relates to the technical field of image translation, in particular to a multi-scale image translation method based on semi-supervised learning.
Background
Image translation refers to the task of automatically transforming one representation of a scene of an image to another. At present, a convolutional neural network is used as a model for image translation, but for model construction of specific tasks, loss function design and optimization strategies are not the same, and burden of model design is greatly increased.
The model outputs a blurred image by minimizing the euclidean distance between the model prediction and the real label. Supervised learning based image translation algorithms require a large amount of paired input-output training data. The existing image translation task is based on a supervised learning model, a large amount of paired training data is needed, and in reality, the paired data is difficult and expensive to obtain.
Aiming at the problems, the method is designed for solving the problems that models of different tasks are complex to construct, the model design burden is heavy, and the data size required by model training and learning is large, so that the image translation with high efficiency and high precision is realized.
Disclosure of Invention
Aiming at the defects, the technical problem to be solved by the invention is to provide a multi-scale image translation method based on semi-supervised learning, so as to solve the problems of complex construction of different task models, heavy model design burden and large data volume required by model training and learning in the prior art.
The invention provides a multi-scale image translation method based on semi-supervised learning, which comprises the following specific steps of:
generating a confrontation network based on multiple scales, and supervising and training a model according to paired data;
and generating discrimination loss of the countermeasure network based on the cycle consistency and multi-scale of the dual learning, and obtaining a high-performance image translation model according to the unpaired data unsupervised training model.
Preferably, the generating the countermeasure network based on multiple scales and the training model based on paired data supervision specifically includes:
acquiring a data set x consisting of pairs of imagesi,yiIn which xi∈X,yiE.g. Y, X and Y being two associated image domains;
simultaneously supervising and training two dual convolution network models G and F according to paired image data, wherein a discriminator corresponding to the convolution network model G is DXThe discriminator corresponding to the convolution network model F is DY
Optimizing the model based on the L1 distance of the minimized output from the target;
and based on the dual two convolutional network models G and F, obtaining a model with the output of the generator consistent with the target domain through a discriminator of the multi-scale generation antagonistic network.
Preferably, the training based on the discrimination loss of the multi-scale generation countermeasure network specifically includes:
discriminating images of different scales by a plurality of discriminators D, each discriminatoriHas a loss function of
Figure BDA0002843343380000021
Where x and y are each image data paired in a dataset, G (x) is a sample generated by the generator that is as far as possible subject to the distribution of the true data x, Di(y) is a discriminator DiFrom the classification probability, D, derived from yi(G (x)) is a discriminator DiThe classification probability is obtained according to G (x);
the method of taking an average value is adopted to synthesize the gradients of the loss functions of all the discriminators to obtain the total loss function of the discriminators
Figure BDA0002843343380000022
The generator iterates the model based on the gradients of the loss functions of all the discriminators, and the loss function of the generator D is obtained as follows: lambda [ alpha ]D(G,D,x,y)=-∑logDi(G(x))+||G(x)-y||1
And alternately training the discriminator and the generator according to the steps to obtain a multi-scale generation confrontation network model.
Preferably, the two convolution network models G and F based on dual obtain a loss function of its discriminator through multi-scale generative confrontation network training as:
Figure BDA0002843343380000023
Figure BDA0002843343380000024
the loss function of the generator is:
λG(G,DY,x,y)=-logDY(G(x))+||G(x)-y||1
λF(F,DX,y,x)=-logDX(F(y))+||F(y)-x||1wherein D isX() Is a discriminator DXDerived classification probability, DY() Is a discriminator DYAnd G () is a sample generated by the generator and subjected to the distribution of the real data x as much as possible, and F () is a conversion relation between the original image and the generator sample.
Preferably, the discrimination loss of the countermeasure network is generated based on the cycle consistency and the multi-scale of the dual learning, and the unsupervised training model according to the unpaired data comprises the following specific steps:
acquiring two unpaired data sets X and Y;
two convolution network models G and F obtained based on supervised training according to unpaired data xj,yjUnsupervised training yielded two translation models G that satisfied cycle consistency: x → y and F: y → x, wherein xj∈X,yj∈Y;
Two confrontation discriminators D for generating the confrontation network through multi-scale while obtaining two translation models through trainingxAnd DyJudging the difference between the generated data and the target domain real data;
and fitting the output distribution of the model with the distribution of the target domain image according to the difference training model to obtain a generator which outputs a sample consistent with the target domain.
Preferably, the cyclic consistency satisfied by the two translation models is in particular: for any X ∈ X: x → G (x) → F (G (x)) ≈ x, and for an arbitrary Y ∈ Y: y → F (y) → G (F (y) ≈ y).
Preferably, the objective function of the discriminator of the high-performance image translation model is as follows:
Figure BDA0002843343380000031
Figure BDA0002843343380000032
the penalty function of the generator is:
λG(G,DY,x,y)=-logDY(G(x)),λF(F,DX,y,x)=-logDX(F (y)) wherein DX() Is a discriminator DXDerived classification probability, DY() Is a discriminator DYAnd G () is a sample generated by the generator and subjected to the distribution of the real data x as much as possible, and F () is a conversion relation between the original image and the generator sample.
Preferably, the cycle consistency loss function of the high-performance image translation model is as follows:
λcons(G,F,x,y)=||F(G(x))-x||1+||G(F(y))-y||1
optimizing the antagonistic loss and the cyclic consistent loss function to train the model, and obtaining a total loss function of the generator as follows: lunpaired(G,F,x,y)=lG(G,DY,x,y)+lF(G,DY,y,x)+λlcons(G, F, x, y), where λ is the hyperparameter, controls the ratio of the two losses.
According to the scheme, compared with the prior art, the multiscale image translation method based on semi-supervised learning has the advantages that a confrontation network is generated through multiscale discrimination, all image translation tasks are simultaneously completed through a unified framework, the model is simple to construct, the model design burden is light, and a plurality of unreasonable forged objects are prevented from locally appearing in the generated image through simultaneous discrimination of a plurality of scales of the generated image; the method combines generation of a countermeasure network and dual learning to train the model by utilizing the unpaired data, improves the effect of semi-supervised image translation, effectively utilizes the unpaired image information to improve the performance of the model, and reduces the requirement of model training on paired images. The method effectively overcomes the defect that a supervision image translation algorithm needs a large amount of training data, can accelerate model convergence, improves the performance of the model, solves the problems of complex construction of different task models, heavy model design burden and large data amount required by model training and learning in the prior art, has obvious effect and is suitable for wide popularization.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a process block diagram i of a multi-scale image translation method based on semi-supervised learning according to an embodiment of the present invention;
fig. 2 is a process block diagram of a multi-scale image translation method based on semi-supervised learning according to an embodiment of the present invention;
fig. 3 is a block diagram of a process of a multi-scale image translation method based on semi-supervised learning according to an embodiment of the present invention;
FIG. 4 is a block diagram of a process of training two models through supervised learning in a multi-scale image translation method based on semi-supervised learning according to an embodiment of the present invention;
FIG. 5 is a block diagram of a process of improving performance through dual learning of a multi-scale image translation method based on semi-supervised learning according to an embodiment of the present invention;
FIG. 6 is a comparison diagram obtained by translating two images of 3 face images made up by different people;
fig. 7 is a comparison diagram obtained by translating two kinds of images of face images of 3 different makeup styles corresponding to the same person.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1 to 5, a multi-scale image translation method based on semi-supervised learning according to an embodiment of the present invention will be described. The multi-scale image translation method based on semi-supervised learning is a semi-supervised image translation method based on L1 loss, cycle consistency of dual learning and countervailing loss of a countervailing network in a multi-scale mode. In the supervised training phase, training the model by optimizing the L1 loss and discriminating the loss; in the unsupervised learning phase, the model is trained based on the cycle consistency of dual learning and through multi-scale generation of confrontation losses of the confrontation network. The method comprises the following specific steps:
s1, generating a confrontation network based on multiple scales, and supervising and training the model according to paired data;
the generation countermeasure network is composed of a generator and a discriminator, the generator models the potential distribution of data and generates new samples; the discriminator discriminates the difference between the real sample and the generated sample. The generation countermeasure network is a pair of network, a generator network and a discriminator network. Semi-supervised training of a model involving two data sets including a paired data set { xi,yiAnd an unpaired data set consisting of an X-domain image { X }jAnd Y field image { Y }jAnd (4) two parts.
The specific implementation steps of the step can be as follows:
s1.1, acquiring a data set x consisting of paired imagesi,yiIn which xi∈X,yiE.g. Y, X and Y being two associated image domains;
s1.2, simultaneously supervising and training two dual convolution network models G and F according to paired image data, wherein a discriminator corresponding to the convolution network model G is DXThe discriminator corresponding to the convolution network model F is DY
D representing the discriminator is a differentiable function whose inputs are the real data x and the random variable z, respectively. Differentiable function g (z) is a sample generated by the generator that is as compliant as possible with the true data distribution. The discriminator can be a multi-scale discriminator with shared weight, the generator and the discriminator both use a full convolution network, and the generator architecture can adopt a U-net structure, so that the information of the bottom layer of the input image can be utilized in the image translation process. In the full convolution discrimination network, different layers represent the activation values of images with different scales, the activation value of the bottom layer represents the activation value of a small image block, and the activation value of the upper layer represents the activation value of the whole image. So that a plurality of discriminators are represented using one network, and discrimination of multi-scale images is performed simultaneously.
The arbiter for generating the countermeasure network being a multi-classification network, DiAre the classification probabilities of the different classes.
The input network F transforms the original image to a feature space, the domain mapping function F: x → Y such that F (X)i)≈yiAnd the output network G maps the features to the target domain image, and a domain mapping function G is obtained according to the function F: x → Y such that F (X) to F (G (X)).
S1.3, optimizing a model based on the L1 distance between the minimized output and the target;
s1.4, based on the dual two convolution network models G and F, obtaining a model with the output of the generator consistent with the target domain through a discriminator of a multi-scale generation countermeasure network.
In a multiscale generative confrontation network, different scales correspond to different discriminators, each discriminator DiHas a loss function of
Figure BDA0002843343380000061
Overall penalty function of arbiter:
Figure BDA0002843343380000062
the loss function of the generator is: lambda [ alpha ]D(G,D,x,y)=-∑logDi(G(x))+||G(x)-y||1. For different convolutional layers, feature maps of different scales are obtained. And the activation values of different feature layers represent feature values obtained after the image blocks with different scales are subjected to feature extraction. The feature maps of different layers can be transformed to the through-scale feature map with the channel number of 1 by convolution of 1 × 1, and then the probability of the output of the discriminator is obtained through a function.
The network may be trained by alternately training the arbiter D and the generator G. A batch random gradient descent and adam optimizer were used. The discriminators are trained using images generated by a past generator such that the discriminators have memory of samples generated in the past, thereby mitigating the pattern collapse problem of generating an anti-net. The learning process of generating the countermeasure network is the process of continuous game and iterative optimization of the generator and the discriminator. The specific process can be as follows: fixing a generator G, and optimizing a discriminator D to maximize the discrimination accuracy of D; and then fixing a discriminator D, optimizing a generator G, minimizing the discrimination accuracy of the discriminator D, and finally solving a global optimal solution, wherein the training of the discriminator D is a process of minimizing cross entropy. The model captures high frequency components of the image by opposing losses, and the discriminator that generates the opposing network using multi-scale discrimination forces the output of the model to coincide with the target image.
And S2, obtaining a high-performance image translation model according to the unpaired data unsupervised learning training model based on the cycle consistency of the dual learning and the discrimination loss of the generated countermeasure network.
In a multi-scale generation countermeasure network, images of different scales are discriminated by a plurality of discriminators. The generator iterates the model based on the gradients of the losses of all the discriminators, and integrates the gradients of the loss functions of all the discriminators by adopting an averaging method. The specific implementation steps can be as follows:
discriminating images of different scales by a plurality of discriminators D, each discriminatoriHas a loss function of
Figure BDA0002843343380000063
Where x and y are each image data paired in a dataset, G (x) is a sample generated by the generator that is as far as possible subject to the distribution of the true data x, Di(y) is a discriminator DiFrom the classification probability, D, derived from yi(G (x)) is a discriminator DiThe classification probability is obtained according to G (x);
the method of taking an average value is adopted to synthesize the gradients of the loss functions of all the discriminators to obtain the total loss function of the discriminators
Figure BDA0002843343380000064
The generator iterates the model based on the gradients of all the discriminator-loss functions,the loss function for generator D is found as: lambda [ alpha ]D(G,D,x,y)=-∑logDi(G(x))+||G(x)-y||1
And alternately training the discriminator and the generator according to the steps to obtain a multi-scale generation confrontation network model.
Based on the dual two convolution network models G and F, the loss function of the discriminator is obtained by multi-scale generation of the confrontation network training:
Figure BDA0002843343380000071
Figure BDA0002843343380000072
the loss function of the generator is:
λG(G,DY,x,y)=-logDY(G(x))+||G(x)-y||1
λF(F,DX,y,x)=-logDX(F(y))+||F(y)-x||1wherein D isX() Is a discriminator DXDerived classification probability, DY() Is a discriminator DYAnd G () is a sample generated by the generator and subjected to the distribution of the real data x as much as possible, and F () is a conversion relation between the original image and the generator sample.
The specific implementation steps of the step can be as follows:
s2.1, acquiring two unpaired data sets X and Y and two image data sets { X }jAnd { y }j};
S2.2, two convolution network models G and F obtained based on supervised training are obtained according to unpaired data xj,yjUnsupervised training yielded two translation models G that satisfied cycle consistency: x → y and F: y → x, wherein xj∈X,yjE is Y, G and F form a mapping closed loop;
two translation models G: x → y and F: y → X can form a forward cycle X → Y → X and a reverse cycle Y → X → Y, with the resulting output for each cycle being identical to the input, satisfying cycle consistency. Wherein for any X ∈ X: x → G (x) → F (G (x)) ≈ x, and for an arbitrary Y ∈ Y: y → F (y) → G (F (y) ≈ y).
The performance of the model is improved by utilizing unpaired data based on the cycle consistency of the dual learning model, and the obtained objective function is as follows: lambda [ alpha ]cons(G,F,x,y)=||F(G(x))-x||1+||G(F(y))-y||1
S2.3, training to obtain two translation models, and generating two confrontation discriminators D of the confrontation network through multiple scalesxAnd DyJudging the difference between the generated data and the target domain real data;
in a multiscale generative countermeasure network, different scales correspond to different discriminators, each discriminator having a loss function of
Figure BDA0002843343380000073
Overall penalty function of arbiter:
Figure BDA0002843343380000074
the resulting loss function is: lambda [ alpha ]D(G,D,x,y)=-∑logDi(G(x))。
And S2.4, fitting the output distribution of the model with the distribution of the target domain image according to the difference training model to obtain a generator which outputs samples consistent with the target domain.
And forcing the output distribution of the model to be fitted to the distribution of the target domain image by using a discriminator for generating the countermeasure network, thereby improving the performance of the model by using the unpaired data.
The target function of the discriminator of the high-performance image translation model is as follows:
Figure BDA0002843343380000081
Figure BDA0002843343380000082
the penalty function of the generator is:
λG(G,DY,x,y)=-logDY(G(x)),λF(F,DX,y,x)=-logDX(F (y)) wherein DX() To judge
Pin DXDerived classification probability, DY() Is a discriminator DYAnd G () is a sample generated by the generator and subjected to the distribution of the real data x as much as possible, and F () is a conversion relation between the original image and the generator sample.
The cycle consistency loss function is: lambda [ alpha ]cons(G,F,x,y)=||F(G(x))-x||1+||G(F(y))-y||1
And simultaneously optimizing the functions of the immunity loss and the cyclic consistent loss to train the model, and obtaining the total loss function of the generator as follows: lunpaired(G,F,x,y)=lG(G,DY,x,y)+lF(G,DY,y,x)+λlcons(G, F, x, y), where λ is the hyperparameter, controls the ratio of the two losses.
The multi-scale image translation method based on semi-supervised learning firstly trains a model by using a small amount of paired data; compared with the prior art, the multi-scale discrimination generation countermeasure network uses a uniform frame to simultaneously complete all image translation tasks, and simultaneously discriminates multiple scales of the generated image, so that the generated image is prevented from locally appearing many unreasonable forged objects.
The countermeasure network is generated based on multi-scale discrimination to carry out supervision training on the model, the generation effect of the originally generated countermeasure network is improved, the model can not only model the overall structure outline of the image, but also model the local details of the image, and the generated image details are more reasonable. The game of the discriminator and the generator forces the generator to approach to the real sample distribution, meanwhile, the non-paired data is utilized to promote the model, and the requirement of an image translation task on paired images is reduced.
The model is subjected to unsupervised learning training through the cycle consistency based on dual learning and the discrimination loss generated by a countermeasure network, the performance of the model is improved by effectively utilizing the information of the unpaired images, and the requirements of the model training on the paired images are reduced. A countermeasure network is generated through multi-scale discrimination, the unpaired image data is applied to model training, samples generated by a generator are forced to approach a target image by utilizing countermeasure loss, the requirement on the paired samples is reduced, and the image translation effect is improved; and the generation of a countermeasure network and dual learning are combined to utilize an unpaired data training model, so that the effect of semi-supervised image translation is improved. The method effectively overcomes the defect that a large amount of training data is needed for the supervised image translation algorithm, and meanwhile, the model convergence can be accelerated, and the model performance is improved.
The following are exemplary: an image translation method is used in a face makeup removal task, 100 original face images are collected from a network, makeup software is used for generating the face images after makeup, a data set containing 800 original face-makeup face image pairs is obtained, and the data set is applied to a training and testing model. In the experiment, the data set was divided into 4 parts, where 20% paired data was selected for supervised training, 60% unpaired data for unsupervised training, 10% for validation, and 10% for testing the performance of the algorithm. The L1 distance between the model output in the test set and the corresponding real label is used to evaluate the performance of the algorithm.
The method comprises the specific implementation steps of setting an experimental group and a comparison group, wherein the experimental group uses the multi-scale image translation method based on semi-supervised learning, the comparison group uses an L1 loss function training model, 3 face images which are made up by different people are randomly extracted from a data set, then the extracted samples are respectively made up by using the models obtained after the training of the experimental group and the comparison group to restore the extracted samples to original face images, in addition, the face images which correspond to the same person and have different makeup styles are randomly selected from a test set, and the makeup removal task is also carried out to convert the extracted samples to the original images. Referring to fig. 6 to 7, the face image after makeup-original face image-face image obtained after makeup removal of experimental group-face image obtained after makeup removal of control group are shown from left to right in sequence, and the following experimental results are obtained through analysis:
the distance L1 in the face makeup removal task for the experimental group was 0.034, and the distance L1 in the face makeup removal task for the control group was 0.089. The smaller the L1 distance, the smaller the difference between the translated image and the real image, and the more real the translated image. By comparison, the conclusion is finally drawn: compared with a control group, the experimental group can well complete the makeup removing task of the face image, and only a small number of paired makeup-makeup removing images are needed.
The data show that the image translation method can reduce the requirement of the image translation task on paired images, and the obtained image translation result is better and has very obvious effect.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. Details which are not described in detail in the embodiments of the invention belong to the prior art which is known to the person skilled in the art.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (8)

1. A multi-scale image translation method based on semi-supervised learning is characterized by comprising the following specific steps:
generating a confrontation network based on multiple scales, and supervising and training a model according to paired data;
and generating discrimination loss of the countermeasure network based on the cycle consistency and multi-scale of the dual learning, and obtaining a high-performance image translation model according to the unpaired data unsupervised training model.
2. The method for translating the multi-scale images based on the semi-supervised learning as recited in claim 1, wherein the generating of the countermeasure network based on the multi-scale comprises the following specific steps of training a model according to paired data supervision:
acquiring a data set x consisting of pairs of imagesi,yiIn which xi∈X,yiE.g. Y, X and Y being two associated image domains;
simultaneously supervising and training two dual convolution network models G and F according to paired image data, wherein a discriminator corresponding to the convolution network model G is DXThe discriminator corresponding to the convolution network model F is DY
Optimizing the model based on the L1 distance of the minimized output from the target;
and based on the dual two convolutional network models G and F, obtaining a model with the output of the generator consistent with the target domain through a discriminator of the multi-scale generation antagonistic network.
3. The method for translating the multi-scale images based on the semi-supervised learning as claimed in claim 2, wherein the training based on the discriminant loss of the multi-scale generation countermeasure network comprises the following specific steps:
discriminating images of different scales by a plurality of discriminators D, each discriminatoriHas a loss function of
Figure FDA0002843343370000011
Where x and y are each image data paired in a dataset, G (x) is a sample generated by the generator that is as far as possible subject to the distribution of the true data x, Di(y) is a discriminator DiFrom the classification probability, D, derived from yi(G (x)) is a discriminator DiThe classification probability is obtained according to G (x);
the method of taking an average value is adopted to synthesize the gradients of the loss functions of all the discriminators to obtain the total loss function of the discriminators
Figure FDA0002843343370000012
The generator iterates the model based on the gradients of the loss functions of all the discriminators to obtain the loss function lambda of the generator DD(G,D,x,y)=-∑logDi(G(x))+||G(x)-y||1
And alternately training the discriminator and the generator according to the steps to obtain a multi-scale generation confrontation network model.
4. The method for multi-scale image translation based on semi-supervised learning as claimed in claim 3, wherein the two convolution network models G and F based on dual are obtained by multi-scale generation of a loss function of a discriminator of a confrontation network through training as
Figure FDA0002843343370000021
Figure FDA0002843343370000022
The loss function of the generator is
λG(G,DY,x,y)=-logDY(G(x))+||G(x)-y||1
λF(F,DX,y,x)=-logDX(F(y))+||F(y)-x||1Wherein D isX() Is a discriminator DXDerived classification probability, DY() Is a discriminator DYAnd G () is a sample generated by the generator and subjected to the distribution of the real data x as much as possible, and F () is a conversion relation between the original image and the generator sample.
5. The method for multi-scale image translation based on semi-supervised learning as claimed in claim 4, wherein the discrimination loss of the antagonistic network generated based on the cycle consistency and multi-scale of the dual learning comprises the following specific steps according to an unpaired data unsupervised training model:
acquiring two unpaired data sets X and Y;
two convolution network models G and F obtained based on supervised training according to unpaired data xj,yjUnsupervised training yielded two translation models G that satisfied cycle consistency: x → y and F: y → x, wherein xj∈X,yj∈Y;
Two confrontation discriminators D for generating the confrontation network through multi-scale while obtaining two translation models through trainingxAnd DyJudging the difference between the generated data and the target domain real data;
and fitting the output distribution of the model with the distribution of the target domain image according to the difference training model to obtain a generator which outputs a sample consistent with the target domain.
6. The multi-scale image translation method based on semi-supervised learning as recited in claim 5, wherein the cyclic consistency satisfied by the two translation models is specifically: for any X ∈ X: x → G (x) → F (G (x)) ≈ x, and for an arbitrary Y ∈ Y: y → F (y) → G (F (y) ≈ y).
7. The multi-scale image translation method based on semi-supervised learning as claimed in claim 6, wherein the objective function of the discriminator of the high-performance image translation model is as follows:
Figure FDA0002843343370000023
Figure FDA0002843343370000024
the penalty function of the generator is:
λG(G,DY,x,y)=-logDY(G(x)),λF(F,DX,y,x)=-logDX(F (y)) wherein DX() Is a discriminator DXDerived classification probability, DY() To judgePin DYAnd G () is a sample generated by the generator and subjected to the distribution of the real data x as much as possible, and F () is a conversion relation between the original image and the generator sample.
8. The multi-scale image translation method based on semi-supervised learning of claim 7, wherein the cycle consistency loss function of the high-performance image translation model is as follows:
λcons(G,F,x,y)=||F(G(x))-x||1+||G(F(y))-y||1
optimizing the antagonistic loss and the cyclic consistent loss function to train the model, and obtaining a total loss function of the generator as follows: lunpaired(G,F,x,y)=lG(G,DY,x,y)+lF(G,DY,y,x)+λlcons(G, F, x, y), where λ is the hyperparameter, controls the ratio of the two losses.
CN202011500040.4A 2020-12-18 2020-12-18 Multi-scale image translation method based on semi-supervised learning Withdrawn CN112434798A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011500040.4A CN112434798A (en) 2020-12-18 2020-12-18 Multi-scale image translation method based on semi-supervised learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011500040.4A CN112434798A (en) 2020-12-18 2020-12-18 Multi-scale image translation method based on semi-supervised learning

Publications (1)

Publication Number Publication Date
CN112434798A true CN112434798A (en) 2021-03-02

Family

ID=74696715

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011500040.4A Withdrawn CN112434798A (en) 2020-12-18 2020-12-18 Multi-scale image translation method based on semi-supervised learning

Country Status (1)

Country Link
CN (1) CN112434798A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113076806A (en) * 2021-03-10 2021-07-06 湖北星地智链科技有限公司 Structure-enhanced semi-supervised online map generation method
CN113569917A (en) * 2021-07-01 2021-10-29 浙江大学 Self-supervision image translation method and system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113076806A (en) * 2021-03-10 2021-07-06 湖北星地智链科技有限公司 Structure-enhanced semi-supervised online map generation method
CN113569917A (en) * 2021-07-01 2021-10-29 浙江大学 Self-supervision image translation method and system
CN113569917B (en) * 2021-07-01 2023-12-12 浙江大学 Self-supervision image translation method and system

Similar Documents

Publication Publication Date Title
CN110188239B (en) Double-current video classification method and device based on cross-mode attention mechanism
CN110458844B (en) Semantic segmentation method for low-illumination scene
CN108520535B (en) Object classification method based on depth recovery information
Bo et al. Fast algorithms for large scale conditional 3D prediction
JP2022548712A (en) Image Haze Removal Method by Adversarial Generation Network Fusing Feature Pyramids
Kolesnikov et al. PixelCNN models with auxiliary variables for natural image modeling
Liu et al. Searching a hierarchically aggregated fusion architecture for fast multi-modality image fusion
CN111429340A (en) Cyclic image translation method based on self-attention mechanism
CN108446589B (en) Face recognition method based on low-rank decomposition and auxiliary dictionary in complex environment
CN112784929B (en) Small sample image classification method and device based on double-element group expansion
CN111754532B (en) Image segmentation model searching method, device, computer equipment and storage medium
Hara et al. Towards good practice for action recognition with spatiotemporal 3d convolutions
CN111582230A (en) Video behavior classification method based on space-time characteristics
CN112434798A (en) Multi-scale image translation method based on semi-supervised learning
CN112101262B (en) Multi-feature fusion sign language recognition method and network model
CN113807176B (en) Small sample video behavior recognition method based on multi-knowledge fusion
CN110852199A (en) Foreground extraction method based on double-frame coding and decoding model
Huang et al. A parallel architecture of age adversarial convolutional neural network for cross-age face recognition
CN112950480A (en) Super-resolution reconstruction method integrating multiple receptive fields and dense residual attention
Lin et al. R 2-resnext: A resnext-based regression model with relative ranking for facial beauty prediction
Xu et al. AutoSegNet: An automated neural network for image segmentation
CN116206327A (en) Image classification method based on online knowledge distillation
Ding et al. Sequential convolutional network for behavioral pattern extraction in gait recognition
CN114299344A (en) Low-cost automatic searching method of neural network structure for image classification
Zhang et al. Learning to search efficient densenet with layer-wise pruning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20210302