CN109033095B - Target transformation method based on attention mechanism - Google Patents

Target transformation method based on attention mechanism Download PDF

Info

Publication number
CN109033095B
CN109033095B CN201810866277.0A CN201810866277A CN109033095B CN 109033095 B CN109033095 B CN 109033095B CN 201810866277 A CN201810866277 A CN 201810866277A CN 109033095 B CN109033095 B CN 109033095B
Authority
CN
China
Prior art keywords
attention
layer
image
model
calculating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810866277.0A
Other languages
Chinese (zh)
Other versions
CN109033095A (en
Inventor
胡伏原
叶子寒
李林燕
孙钰
付保川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University of Science and Technology
Original Assignee
Suzhou University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University of Science and Technology filed Critical Suzhou University of Science and Technology
Priority to CN201810866277.0A priority Critical patent/CN109033095B/en
Publication of CN109033095A publication Critical patent/CN109033095A/en
Application granted granted Critical
Publication of CN109033095B publication Critical patent/CN109033095B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention relates to a target transformation method based on an attention mechanism, which comprises the following steps: training a neural network model: step 1, initializing parameters of a neural network model by using random numbers; step 2, inputting an image X belonging to the category X into a generator G of the model, entering an encoding stage, and calculating a first-layer characteristic diagram f by the X through a convolution layer 1 . And performing target transformation on the image by using the trained neural network model, and introducing an attention mechanism into the model to enable the model to identify a target object to be converted in a target change task so as to distinguish the target from the background. Meanwhile, the consistency of the background of the original image and the converted image is ensured by constructing an attention consistency loss function and a background consistency loss function.

Description

Target transformation method based on attention mechanism
Technical Field
The invention relates to image translation, in particular to a target transformation method based on an attention mechanism.
Background
Object transformation (Object transformation) is a special task of image translation, whose purpose is to transform a specific type of Object in an image into another type of Object. Image translation (Image translation) aims at converting an original Image into an Image of a target style by learning a mapping relationship between two types of images, and has been applied to many aspects such as Image super-resolution reconstruction, artistic style migration, and the like in recent years. Researchers have proposed many efficient transformation methods under supervised conditions. However, the conversion method under the unsupervised condition becomes a research hotspot in image translation due to the large labor cost and time cost required for acquiring paired data. Visual Attribute Transfer (VAT) is a representation of the convolutional neural network CNN-based approach, which uses features at different levels in the model to match the most likely corresponding features in another graph. In addition, a method using a Generative Adaptive Network (GAN) achieves more significant effects than a method based on a convolutional neural network. Isola P et al explored the potential of GAN in image translation tasks. Subsequently, cycle-dependent Loss was proposed by Zhu j.y et al to solve the problem of unsupervised image translation, which assumed that the mapping relationship learned in the image translation task was a bi-directional mapping, and thus enhanced the effect of image translation of the model in an unsupervised environment.
The traditional technology has the following technical problems:
most of the current image translation methods do not take into account the difference between the conversion object and the background region. In a target change task, most models are difficult to effectively distinguish a conversion target from a background, and the consistency of an original image background and a conversion image background cannot be ensured. Therefore, the model generates the effects of blurring, discoloring and the like on the image background in the conversion process, and the quality of the converted image is reduced.
Disclosure of Invention
In view of the above, it is necessary to provide an object transformation method based on attention mechanism, which can distinguish the object from the background by introducing attention mechanism into the model to enable the model to identify the object to be transformed in the task of object transformation. Meanwhile, the consistency of the background of the original image and the converted image is ensured by constructing an attention consistency loss function and a background consistency loss function.
An attention-based target transformation method, comprising:
training a neural network model:
step 1, initializing parameters of a neural network model by using random numbers;
step 2, inputting an image X belonging to the category X into a generator G of the model, entering an encoding stage, and calculating a first-layer characteristic diagram f by the X through a convolution layer 1
Step 3, then f 1 Two branch networks will be traversed: (a) One convolution layer obtains the characteristic diagram of the second layer without attention mask processing
Figure GDA0003606382100000021
(b) First passes through two convolutional layersThen passing through a deconvolution layer to obtain an
Figure GDA0003606382100000022
Corresponding attention mask M 2 (ii) a Will M 2 And
Figure GDA0003606382100000023
element by element multiplication, the product being obtained and
Figure GDA0003606382100000024
are added one by one to obtain a processed second layer characteristic diagram f 2
Step 4, f 2 Obtaining the characteristic diagram f of the next layer according to the mode of the step 3 3 (ii) a Then, f 3 Further fine features are obtained by 6 layers of residual convolution layers with convolution kernel size of 3 x 3 and step size of 1;
step 5, entering a decoding stage, and taking the deconvolution layer as a decoder; f. of 3 Two branch networks will be traversed: (a) An deconvolution layer is subjected to a second layer profile without attention masking
Figure GDA0003606382100000025
(b) First through two deconvolution layers and then through one convolution layer to obtain a sum
Figure GDA0003606382100000026
Corresponding attention mask M 4 (ii) a Will M 4 And
Figure GDA0003606382100000027
element by element multiplication, the product being obtained and
Figure GDA0003606382100000028
are added one by one to obtain a processed second layer characteristic diagram f 5
Step 6, entering an output stage, f 5 Two branch networks will be traversed: (a) an deconvolution layer to obtain a transformed image y'; (b) The y' is obtained by two deconvolution layers and one convolution layerAttention mask M G(x)
Step 7, y 'is input into another generator F, and the same operation as in step 2-6 is performed to obtain x' and the corresponding attention mask M F(G(x))
Step 8, inputting x and x' into a discriminator D x Middle, discriminator D x The probability that the input image belongs to the category X is returned; likewise, y and y' are input to the discriminator D Y Obtaining the probability that Y and Y' belong to the category Y; the value of the opposition loss function is thus calculated:
Figure GDA0003606382100000031
Figure GDA0003606382100000032
step 9, calculating the value of the cycle consistent loss function according to x, x ', y, y':
L cyc (G,F)=||x′-x|| 1 +||y′-y|| 1 #(3)
step 10, use M G(x) Separating the background in x and y' from the conversion target, calculating the background change loss:
L bg (x,G)=γ*||B(x,M G(x) )-B(y′,M G(x) )|| 1 #(4)
B(x,M G(x) )=H(x,1-M G(x) )#(5)
γ is set to 0.000075 to 0.0075; the value of the H (K, L) function is that elements in K are multiplied by elements in L one by one; also, M may be used F(G(x)) Calculating the background change loss L by using y and x bg (y,F);
Step 11, with M G(x) And M F(G(x)) Calculating the attention change loss:
L att (x,G,F)=α*||M G(x) -M F(G(x)) || 1 +β*(M G(x) +M F(G(x)) )#(6)
α is set to 0.000003 to 0.00015, β is set to 0.0000005 to 0.00005;
step 12, adjusting model parameters according to the error obtained in the previous step 8-11 by a back propagation algorithm with the learning rate of 0.00002 to 0.002;
step 13, taking y as an input image, and calculating an error through the operations of the steps 2 to 11, except that the y passes through a generator F and then a generator G; adjusting the model parameters according to the method in the step 12;
step 14, continuously repeating the steps 2-13 until the model parameters are converged;
and carrying out target transformation on the image by using the neural network model obtained by training.
The above target transformation method based on the attention mechanism enables the model to identify the target object needing to be converted in the target change task by introducing the attention mechanism into the model, so as to distinguish the target from the background. Meanwhile, the consistency of the background of the original image and the converted image is ensured by constructing an attention consistency loss function and a background consistency loss function.
In another embodiment, α is set to 0.000015.
In another embodiment, β is set to 0.000005.
In another embodiment, γ is set to 0.00075.
In another embodiment, the back propagation algorithm is optimized by Adam.
In another embodiment, the learning rate of the back propagation algorithm is 0.0002.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of any of the methods when the program is executed.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of any of the methods.
A processor for running a program, wherein the program when running performs any of the methods.
Drawings
Fig. 1 is an overall schematic diagram of a model structure of an attention-based target transformation method according to an embodiment of the present application.
Fig. 2 shows three different DAU structures in an attention-based target transformation method according to an embodiment of the present application. (DAU) decode And DAU final Structurally, the Attention Mask depth is different only for output. )
FIG. 3 is a comparison of experimental results of an attention-based objective transformation method with the CycleGAN and VAT methods on ImageNet datasets, provided by an embodiment of the present application.
FIG. 4 is a comparison of the results of experiments on CelebA data sets with the cycleGAN and VAT methods using an attention-based target transformation method provided in the examples of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
An attention-based target transformation method, comprising:
training a neural network model:
step 1, initializing parameters of a neural network model by using random numbers;
step 2, inputting an image X belonging to the category X into a generator G of the model, entering a coding stage, and calculating a first-layer characteristic diagram f by the X through a convolution layer 1
Step 3, then f 1 Two branch networks will be traversed: (a) One convolution layer obtains the characteristic diagram of the second layer without attention mask processing
Figure GDA0003606382100000051
(b) First through two convolutional layers and then through a deconvolution layer to obtain a sum
Figure GDA0003606382100000052
Corresponding attention mask M 2 (ii) a Will M 2 And with
Figure GDA0003606382100000053
Element by element multiplication, the product being further multiplied by
Figure GDA0003606382100000054
Are added one by one to obtain a processed second layer characteristic diagram f 2
Step 4, f 2 Obtaining the characteristic diagram f of the next layer according to the mode of the step 3 3 (ii) a Then, f 3 Further fine features are obtained by 6 layers of residual convolution layers with convolution kernel size of 3 x 3 and step size of 1;
step 5, entering a decoding stage, and taking the deconvolution layer as a decoder; f. of 3 Two branch networks will be traversed: (a) An deconvolution layer is subjected to a second layer profile without attention masking
Figure GDA0003606382100000055
(b) First through two deconvolution layers and then through one convolution layer to obtain the sum
Figure GDA0003606382100000056
Attention mask M 4 (ii) a Will M 4 And
Figure GDA0003606382100000057
element by element multiplication, the product being further multiplied by
Figure GDA0003606382100000058
Are added one by one to obtain a processed second layer characteristic diagram f 5
Step 6, entering an output stage, f 5 Two branch networks will be traversed: (a) an deconvolution layer to obtain a transformed image y'; (b) Obtaining an attention mask M corresponding to y' through two deconvolution layers and a convolution layer G(x)
Step 7, y' will beInputting into another generator F, and obtaining x' and corresponding attention mask M after the same operation as the step 2-6 F(G(x))
Step 8, inputting x and x' into a discriminator D x Middle, discriminator D x The probability that the input image belongs to category X will be returned; likewise, y and y' are input to the discriminator D Y Obtaining the probability that Y and Y' belong to the category Y; the value of the opposition loss function is thus calculated:
Figure GDA0003606382100000061
Figure GDA0003606382100000062
step 9, calculating the value of the cycle consistent loss function according to x, x ', y, y':
L cyc (G,F)=||x′-x|| 1 +||y′-y|| 1 #(3)
step 10, use M G(x) Separating the background in x and y' from the conversion target, calculating the background change loss:
L bg (x,G)=γ*||B(x,M G(x) )-B(y′,M G(x) )|| 1 #(4)
B(x,M G(x) )=H(x,1-M G(x) )#(5)
γ is set to 0.000075 to 0.0075; the value of the H (K, L) function is that elements in K are multiplied by elements in L one by one; also, M may be used F(G(x)) Calculating the background change loss L from y and x bg (y,F);
Step 11, with M G(x) And M F(G(x)) Calculating attention change loss:
L att (x,G,F)=α*||M G(x) -M F(G(x)) || 1 +β*(M G(x) +M F(G(x)) )#(6)
α is set to 0.000003 to 0.00015, β is set to 0.0000005 to 0.00005;
step 12, adjusting model parameters according to the error obtained in the previous step 8-11 by a back propagation algorithm with the learning rate of 0.00002 to 0.002;
step 13, taking y as an input image, and calculating an error through the operations of the steps 2 to 11, except that the y passes through a generator F and then a generator G; adjusting the model parameters according to the method in the step 12;
step 14, continuously repeating the steps 2-13 until the model parameters are converged;
and carrying out target transformation on the image by using the neural network model obtained by training.
The above object transformation method based on the attention mechanism enables the model to identify the object needing to be transformed in the object change task by introducing the attention mechanism into the model, so as to distinguish the object from the background. Meanwhile, the consistency of the background of the original image and the converted image is ensured by constructing an attention consistency loss function and a background consistency loss function.
In another embodiment, α is set to 0.000015.
In another embodiment, β is set to 0.000005.
In another embodiment, γ is set to 0.00075.
In another embodiment, the back propagation algorithm is optimized by Adam.
In another embodiment, the learning rate of the back propagation algorithm is 0.0002.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of any of the methods when executing the program.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of any of the methods.
A processor for running a program, wherein the program when running performs any of the methods.
A specific application scenario of the present invention is described below:
the invention studies to enable a model to distinguish objects from the background while learning to map an image set X containing one type of object to an image set Y containing another type of object. The following figure shows the architecture of the model herein, our model comprising 4 modules: generator G, generator F, and discriminator D X And a sum discriminator D Y . G is used to learn the mapping function G: x → Y. The generator F learns another inverse mapping function F: y → X. D X For distinguishing between the original image x and the converted image F (y), and, correspondingly, D Y To distinguish between the original image y and the transformed image G (x). We build a Deep Attention Unit (DAU) in both generator G and generator F to extract the critical areas.
(1) Depth attention unit:
attention was calculated separately on each modality as follows: attention mask M ∈ R is extracted herein by constructing Deep Attention Unit (DAU) 3 The model has the capability of distinguishing the target from the background. The structure of the generator after the depth attention unit is added is shown in the lower part of fig. 1.
In the encoding Stage (Encode Stage), as shown in the lower half of FIG. 1, a feature map f of the n-1 st layer of an input image x is given n-1 (n is equal to {2,3 }), and a convolution layer is used as an encoder to obtain a characteristic diagram of the next layer of x
Figure GDA0003606382100000081
As shown in FIG. 2 (a), DAU will f n After being encoded by two convolutional layers, the coded signal is further encoded by a sigmoid function (y = 1/(1 + e) -x ) Performing one-time up-sampling on the deconvolution layer as an activation function to obtain a feature map
Figure GDA0003606382100000082
Mask M with consistent foot size n
Figure GDA0003606382100000083
In the decoding stage and the output stage, as shown in FIG. 3 (b), a deep attention unit, denoted as DAU, is used herein as well decode And DAU final . But its process and DAU encode In contrast:
Figure GDA0003606382100000084
the value range of the sigmoid function is [0,1 ]]In between, therefore attention is paid to the mask M n Can be seen as a pair
Figure GDA0003606382100000085
The weight distribution of (2) can enhance the expression of meaningful features and suppress meaningless information. We will M n And with
Figure GDA0003606382100000086
An element-wise product is made, denoted as H (#). Furthermore, referring to the residual network and the residual attention network, we add shortcut to suppress the gradient vanishing problem.
Finally obtaining the characteristic diagram f of the n-th layer through the operation n
Figure GDA0003606382100000087
(2) Round consistent loss function:
CycleGAN uses a cyclic consistent loss function to improve the image translation effect, which is referred to as Dual learning (Dual learning) in the field of machine translation, and it is considered that for each image X in the data set X, this conversion cycle can map X back to the original image: x '= F (y') = F (G (x)) ≈ x. Accordingly: y '= F (x') = G (F (x)) ≈ y. Since the model herein is also a dual learning structure. We also use the round robin uniform loss function
Figure GDA0003606382100000088
Improving the effect of converting the model into the image:
L cvc (G,F)=||F(G(x))-x|| 1 +||G(F(y))-y|| 1 #(6)
(3) Attention consistent loss function:
considering that the spatial position of the target in the image should remain unchanged in the conversion process F (G (x)), an Attention Consistency Loss function (Attention Consistency Loss) is therefore constructed herein to constrain the model:
L att (x,G,F)=α*||M G(x) -M F(G(x)) || 1 +β*(M G(x) +M F(G(x)) )#(7)
M G(x) and M F(G(x)) Representing the masks output by the model in the last layer during the generation of G (x) and F (G (x)), respectively, where the values of the elements represent the probabilities that the corresponding elements belong to the conversion targets in the original image. The second term is a regularization term that prevents over-fitting of the model. α, β are the weights of both terms in the formula.
(4) Background consistent loss function:
when the DAU obtains the attention mask corresponding to the feature map, the model can distinguish the target from the background. A Background consistent Loss function (Background Consistency Loss) was constructed here:
L bg (x,G)=γ*||B(x,M G(x) )-B(G(x),M G(x) )|| 1 #(8)
B(x,M G(x) )=H(x,1-M G(x) )#(9)
γ is a hyperparameter. B (x, M) G(x) ) Is a background function, 1-M G(x) The value of the middle element represents the probability that the corresponding element belongs to the background in the original image. For x and 1-M G(x) And obtaining the background of x by calculating an element-wise product. B (G (x), M) G(x) ) The same is true.
(5) Background consistent loss function:
the effectiveness of the generated image may be enhanced by an adaptive Loss. For the mapping function G: x → Y and its discriminator D Y Expressed as:
Figure GDA0003606382100000091
g will attempt to make the generated image G (x) indistinguishable from the image of the data set Y, and D Y The aim is to distinguish G (x) from y as much as possible. The goal of G is to minimize this objective function, instead D will try to maximize it.
(6) The complete objective function:
Figure GDA0003606382100000101
this translates into a min-max optimization problem:
Figure GDA0003606382100000102
the invention has the advantages that the model can effectively identify the target object in the image, neglect irrelevant background and further improve the final visual nominal effect, and the model obtains the best effect on a plurality of comparison experiments with other current most methods.
The text firstly constructs a Deep Attention Unit (DAU) module based on an Attention accumulation mechanism, and the purpose of the module is to identify a target object in an image, so as to guide a model to eliminate background interference and further prompt a conversion effect.
The experiment was validated on both data sets, imageNet and CelebA. ImageNet is a large-scale image dataset specifically used for machine vision studies. We extracted 995 apple images, 1019 orange images, 1067 horse images and 1334 zebra images from ImageNet for training the model.
Fig. 3 shows the results of comparative experiments on the ImageNet dataset and fig. 4 shows the results of comparative experiments on the CelebA dataset. It is clear that CycleGAN and VAT have a great influence on the background of the original image. For example, in the second column of fig. 3 (a) (b), the leaves fade from green to gray. In fig. four, the conversion of VAT completely fails: the face of the transformed image has been completely deformed and the due transformation features have not appeared. For example, in fig. 4 (b), the conversion between the non-glasses image → glasses image is not converted into an image with glasses on one face by VAT. However, the DAU-GAN method not only successfully completes the conversion task, but also effectively retains the background of the original image. For example, in the figure 3 (c) conversion of horse image → zebra image, the zebra image generated by DAU-GAN not only preserves the background with more natural streaks.
Table 1 mean change value of background for each transformed image.
Figure GDA0003606382100000111
To more accurately demonstrate the effectiveness of our method, we quantitatively counted the mean change in the transformed image background over the test set. Table 1 shows the results of the experiment. For each conversion, the background variation value of the DAU-GAN converted image is minimal. It strongly demonstrates that our model can preserve the background in target changes.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent should be subject to the appended claims.

Claims (9)

1. An attention-based target transformation method, comprising:
training a neural network model:
step 1, initializing parameters of a neural network model by using random numbers;
step 2, inputting an image X belonging to the category X into a generator G of the model, entering an encoding stage, and calculating a first-layer characteristic diagram f by the X through a convolution layer 1
Step 3, then f 1 Two branch networks will be traversed: (a) One convolution layer obtains the characteristic diagram of the second layer without attention mask processing
Figure FDA0003606382090000011
(b) First through two convolutional layers and then through a deconvolution layer to obtain a sum
Figure FDA0003606382090000012
Corresponding attention mask M 2 (ii) a Will M 2 And
Figure FDA0003606382090000013
element by element multiplication, the product being further multiplied by
Figure FDA0003606382090000014
Are added one by one to obtain a processed second layer characteristic diagram f 2
Step 4, f 2 Obtaining the characteristic diagram f of the next layer according to the mode of the step 3 3 (ii) a Then, f 3 Further fine features are obtained by 6 layers of residual convolution layers with convolution kernel size of 3 x 3 and step length of 1;
step 5, entering a decoding stage, and taking the deconvolution layer as a decoder; f. of 3 Two branch networks will be traversed: (a) An deconvolution layer is subjected to a second layer profile without attention masking
Figure FDA0003606382090000015
(b) First through two deconvolution layers and then through one convolution layer to obtain the sum
Figure FDA0003606382090000016
Corresponding attention mask M 4 (ii) a Will M 4 And
Figure FDA0003606382090000017
element by element multiplication, the product being further multiplied by
Figure FDA0003606382090000018
Are added one by one to obtain a processed second layer characteristic diagram f 5
Step 6, enter the output stage, f 5 Two branch networks will be traversed: (a) obtaining a transformed image y' from a deconvolution layer; (b) Obtaining an attention mask M corresponding to y' through two deconvolution layers and a convolution layer G(x)
Step 7, y 'is input into another generator F, and the same operation as in step 2-6 is performed to obtain x' and the corresponding attention mask M F(G(x))
Step 8, inputting x and x' into a discriminator D x Middle, discriminator D x The probability that the input image belongs to the category X is returned; similarly, y and y' are input to a discriminator D Y Obtaining the probability that Y and Y' belong to the category Y; the value of the opposition loss function is thus calculated:
Figure FDA0003606382090000021
Figure FDA0003606382090000022
step 9, calculating the value of the cycle consistent loss function according to x, x ', y, y':
L cyc (G,F)=||x′-x|| 1 +||y′-y|| 1 #(3)
step 10, use M G(x) Separating the background in x and y' from the conversion target, calculating the background change loss:
L bg (x,G)=γ*||B(x,M G(x) )-B(y′,M G(x) )|| 1 #(4)
B(x,M G(x) )=H(x,1-M G(x) )#(5)
γ is set to 0.000075 to 0.0075; the value of the H (K, L) function is that elements in K are multiplied by elements in L one by one; likewise, with M F(G(x)) Calculating the background change loss L from y and x bg (y,F);
Step 11, with M G(x) And M F(G(x)) Calculating the attention change loss:
L att (x,G,F)=α*||M G(x) -M F(G(x)) || 1 +β*(M G(x) +M F(G(x)) )#(6)
α is set to 0.000003 to 0.00015, β is set to 0.0000005 to 0.00005;
step 12, adjusting model parameters according to the error obtained in the previous step 8-11 by a back propagation algorithm with the learning rate of 0.00002 to 0.002;
step 13, taking y as an input image, and calculating an error through the operations of the steps 2 to 11, except that the y passes through a generator F and then a generator G; adjusting the model parameters according to the method in the step 12;
step 14, continuously repeating the steps 2-13 until the model parameters are converged;
and carrying out target transformation on the image by using the neural network model obtained by training.
2. The attention-based mechanism target transformation method of claim 1, wherein α is set to 0.000015.
3. The attention-based mechanism target translation method of claim 1, wherein β is set to 0.000005.
4. The attention-based mechanism target translation method of claim 1, wherein γ is set to 0.00075.
5. An attention-based mechanism target transformation method according to claim 1, characterized in that the back propagation algorithm is Adam optimized.
6. The attention-based mechanism target transformation method of claim 1, wherein the back-propagation algorithm has a learning rate of 0.0002.
7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 6 are implemented when the program is executed by the processor.
8. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.
9. A processor, characterized in that the processor is configured to run a program, wherein the program when running performs the method of any of claims 1 to 6.
CN201810866277.0A 2018-08-01 2018-08-01 Target transformation method based on attention mechanism Active CN109033095B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810866277.0A CN109033095B (en) 2018-08-01 2018-08-01 Target transformation method based on attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810866277.0A CN109033095B (en) 2018-08-01 2018-08-01 Target transformation method based on attention mechanism

Publications (2)

Publication Number Publication Date
CN109033095A CN109033095A (en) 2018-12-18
CN109033095B true CN109033095B (en) 2022-10-18

Family

ID=64647612

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810866277.0A Active CN109033095B (en) 2018-08-01 2018-08-01 Target transformation method based on attention mechanism

Country Status (1)

Country Link
CN (1) CN109033095B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109712068A (en) * 2018-12-21 2019-05-03 云南大学 Image Style Transfer and analogy method for cucurbit pyrography
CN109784197B (en) * 2018-12-21 2022-06-07 西北工业大学 Pedestrian re-identification method based on hole convolution and attention mechanics learning mechanism
CN109829537B (en) * 2019-01-30 2023-10-24 华侨大学 Deep learning GAN network children's garment based style transfer method and equipment
CN111325318B (en) * 2019-02-01 2023-11-24 北京地平线机器人技术研发有限公司 Neural network training method, neural network training device and electronic equipment
CN109902602B (en) * 2019-02-16 2021-04-30 北京工业大学 Method for identifying foreign matter material of airport runway based on antagonistic neural network data enhancement
CN110033410B (en) * 2019-03-28 2020-08-04 华中科技大学 Image reconstruction model training method, image super-resolution reconstruction method and device
CN110084794B (en) * 2019-04-22 2020-12-22 华南理工大学 Skin cancer image identification method based on attention convolution neural network
CN110634101B (en) * 2019-09-06 2023-01-31 温州大学 Unsupervised image-to-image conversion method based on random reconstruction
CN110766638A (en) * 2019-10-31 2020-02-07 北京影谱科技股份有限公司 Method and device for converting object background style in image
CN111489287B (en) * 2020-04-10 2024-02-09 腾讯科技(深圳)有限公司 Image conversion method, device, computer equipment and storage medium
CN111815570B (en) * 2020-06-16 2024-08-30 浙江大华技术股份有限公司 Regional intrusion detection method and related device thereof
CN112884773B (en) * 2021-01-11 2022-03-04 天津大学 Target segmentation model based on target attention consistency under background transformation
CN113256592B (en) * 2021-06-07 2021-10-08 中国人民解放军总医院 Training method, system and device of image feature extraction model
CN113538224B (en) * 2021-09-14 2022-01-14 深圳市安软科技股份有限公司 Image style migration method and device based on generation countermeasure network and related equipment
CN113808011B (en) * 2021-09-30 2023-08-11 深圳万兴软件有限公司 Style migration method and device based on feature fusion and related components thereof
CN113657560B (en) * 2021-10-20 2022-04-15 南京理工大学 Weak supervision image semantic segmentation method and system based on node classification

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108009525A (en) * 2017-12-25 2018-05-08 北京航空航天大学 A kind of specific objective recognition methods over the ground of the unmanned plane based on convolutional neural networks

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108009525A (en) * 2017-12-25 2018-05-08 北京航空航天大学 A kind of specific objective recognition methods over the ground of the unmanned plane based on convolutional neural networks

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DAU-GAN: Unsupervised Object Transfiguration via Deep Attention Unit;Zihan Ye et al.;《BICS 2018》;20180709;第120-129页 *
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks;Jun-Yan Zhu et al.;《arXiv:1703.10593v1》;20170330;第1-18页 *

Also Published As

Publication number Publication date
CN109033095A (en) 2018-12-18

Similar Documents

Publication Publication Date Title
CN109033095B (en) Target transformation method based on attention mechanism
Denton et al. Semi-supervised learning with context-conditional generative adversarial networks
CN111079532B (en) Video content description method based on text self-encoder
CN104866900A (en) Deconvolution neural network training method
Jiang et al. When to learn what: Deep cognitive subspace clustering
CN106157254A (en) Rarefaction representation remote sensing images denoising method based on non local self-similarity
Uddin et al. A perceptually inspired new blind image denoising method using $ L_ {1} $ and perceptual loss
Pieters et al. Comparing generative adversarial network techniques for image creation and modification
CN115984745A (en) Moisture control method for black garlic fermentation
CN116342379A (en) Flexible and various human face image aging generation system
CN111428181A (en) Bank financing product recommendation method based on generalized additive model and matrix decomposition
CN115526223A (en) Score-based generative modeling in a potential space
Wang et al. Learning to hallucinate face in the dark
Zhou et al. Personalized and occupational-aware age progression by generative adversarial networks
Zhu et al. Multiview Deep Subspace Clustering Networks
CN105260736A (en) Fast image feature representing method based on normalized nonnegative sparse encoder
Cong et al. Gradient-semantic compensation for incremental semantic segmentation
Li et al. Adaptive sparsity-regularized deep dictionary learning based on lifted proximal operator machine
Oza et al. Semi-supervised image-to-image translation
CN116977343A (en) Image processing method, apparatus, device, storage medium, and program product
CN115601257A (en) Image deblurring method based on local features and non-local features
Hah et al. Information‐Based Boundary Equilibrium Generative Adversarial Networks with Interpretable Representation Learning
Islam et al. Class aware auto encoders for better feature extraction
CN113222100A (en) Training method and device of neural network model
CN109840888A (en) A kind of image super-resolution rebuilding method based on joint constraint

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant