CN113378949A - Dual-generation confrontation learning method based on capsule network and mixed attention - Google Patents

Dual-generation confrontation learning method based on capsule network and mixed attention Download PDF

Info

Publication number
CN113378949A
CN113378949A CN202110690163.7A CN202110690163A CN113378949A CN 113378949 A CN113378949 A CN 113378949A CN 202110690163 A CN202110690163 A CN 202110690163A CN 113378949 A CN113378949 A CN 113378949A
Authority
CN
China
Prior art keywords
layer
vector space
attention
self
discriminator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110690163.7A
Other languages
Chinese (zh)
Inventor
王蒙
陈家兴
王强
李鑫凯
邵逸轩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN202110690163.7A priority Critical patent/CN113378949A/en
Publication of CN113378949A publication Critical patent/CN113378949A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a dual generation confrontation learning method based on a capsule network and mixed attention, and belongs to the field of artificial intelligence and image processing. The invention relates to an image generation method combining capsule network, self-attention, mixed attention and confrontation learning. And the sample space generator in the sample space self-countermeasure module is a depth generation model based on mixed attention, and the LeNet-5 network is used for reference by the sample space discriminator. The invention improves the operation accuracy and the working efficiency of the counterstudy in the task of generating the clear images by training few samples, reduces the training time and the number of training samples, has stronger generalization capability and verifies the effectiveness of the model on MNIST and other reference data sets.

Description

Dual-generation confrontation learning method based on capsule network and mixed attention
Technical Field
The invention relates to a method for generating a clear image by using few samples, in particular to a dual generation counterstudy method based on a capsule network and mixed attention, and belongs to the field of artificial intelligence and image processing.
Background
Image generation is an important issue in computer vision. In computer vision research over the years, attention mechanisms have been extensively studied and have been used to improve the performance of modern deep neural networks. Attention mechanisms have proven useful in a variety of computer vision tasks, such as image generation and image classification.
Much recent work has proposed using either channel attention or spatial attention, or both, to improve the performance of these neural networks. These attention mechanisms have the function of improving the representation of features generated by standard convolutional layers by establishing correlations between channels or weighted spaces (for spatial attention). The intuition behind learning attention weights is to enable the network to learn where to engage in training and to focus further on the target object. This idea is further advanced by introducing a convolution with a kernel of large size to introduce spatial information encoding.
Hinton et al in 2017 proposed a capsule network and a dynamic routing algorithm that was applied to the primary capsule to digital capsule update and successfully applied to the identification of MNIST datasets. Then, a matrix capsule structure is proposed, which adopts a matrix to express the posture between objects, and adopts an EM algorithm to perform dynamic update between capsules. Recently, the efficiency-capsNet of the attention module is added between the capsule layers, so that the number of primary capsules in the capsule network is reduced to 2% of the original number, and the effectiveness of the attention module on the improvement of the capsule network performance is proved.
At the same time, with the advent of the generation of the antagonistic learning network GAN, a significant advance has been made in the direction of the task of image generation (Goodfellow et al, 2014), but there are still many problems unsolved. GAN based on deep convolutional networks is particularly successful. However, by careful inspection of these generated samples, while the advanced ImageNet GAN model is adept at generating image classes with less structural constraints (e.g., ocean, sky, and landscape classes, which are more differentiated by texture than geometry), it cannot capture geometric or structural patterns that persist in certain classes.
The generation of the countermeasure network comprises two models, generator G and discriminator D respectively, whose training is performed simultaneously: maximizing the probability of correct labeling of training samples and samples from G by training D; while the parameters of generator G are adjusted by minimizing log (1-D (G (z)).
In unconditional generators, the pattern in which the data is generated is not controllable. However, in the case of data tagged, it may be convenient to use the tag as a condition for generating network input, such as CGAN. The idea is to decompose the noise source into an incompressible source and a potential code by a variational self-encoder, trying to find the potential factors of the feature variations by maximizing the interaction information of the variational self-encoder and the generator. Such latent coding can be used to discover object classes in an unsupervised manner, with learned representations having rich semantic information that can handle complex, inter-interlaced factors in image appearance, including pose, lighting, and changes in the emotional content of facial images.
Variational autoencoders and generative countermeasure networks have matured increasingly, but they all have their own advantages and disadvantages. The variational self-encoder is high in training speed and stable, but the generated image is fuzzy; the problems of unstable training, mode collapse, incomplete extraction of intermediate characteristic information, loss of characteristic information and the like often exist in the generation of the countermeasure network. Capsule networks are also always used for classification tasks, but their effectiveness in generating tasks is well documented. In future work, the above-mentioned several deep network models aim at better mutual combination, making up for the weakness, and thus can break their respective limitations.
Disclosure of Invention
The invention aims to provide a dual generation confrontation learning method based on capsule network and mixed attention in the task of generating clear images by training few samples, aiming at the defects and shortcomings of the prior art.
The technical scheme adopted by the invention is as follows: a dual generation confrontation learning method based on capsule network and mixed attention comprises a self-encoder module E based on self-attention and capsule network, a vector space self-confrontation module and a sample space self-confrontation module;
the self-encoder module E based on the self-attention and capsule network comprises a self-encoder module input layer, a parallel convolution layer, a self-attention layer, a primary capsule layer and a final capsule layer;
vector space self-confrontation module comprising a vector space generator GASum vector space discriminator DA
Sample space self-confrontation module comprising a sample space generator GBAnd a sample space discriminator DB
On the premise of generating a basic confrontation learning model, the self-encoder module based on self-attention and capsule network is applied to the construction of a real vector space, so that the accuracy and the efficiency of image feature information extraction are improved for an image generation task, and the number of required training samples is reduced. The invention also adds a vector space self-countermeasure module and a sample space self-countermeasure module, and improves the robustness of the whole framework and the definition and the reality of the finally generated virtual image.
And, the basic architecture of the sample space generator and the sample space discriminator is improved: the sample space generator introduces a channel attention module and a space attention module in the feature mapping process, and focuses on information of different channels of the convolutional layer and space information of the convolutional layer respectively, so that the loss and the loss of the information in the model training process are reduced, and the stability and the accuracy of feature extraction are improved; the sample space discriminator uses LeNet-5 network for discrimination, wherein the first-stage and second-stage convolution pooling operations aim to improve the accuracy of discrimination.
The overall method architecture is shown in fig. 1, and the total training loss function L is:
Figure BDA0003125886380000031
LEas a loss function from the encoder, LGAAnd LDAAre respectively vector space generators GAAnd a discriminator DALoss function of LGBAnd LDBFor a sample space generator GBAnd a discriminator DBThe training loss function of (1). The method comprises the following specific steps:
(1) preprocessing a real picture input from an input layer of an encoder module, and randomly sampling and extracting random noise z and a sample label L from the characteristic distribution of the real picture;
(2) the self-encoder module E encodes the real picture to finally obtain a real vector space Z consisting of the final capsule layereAnd input to a vector space discriminator DAGenerator of vector space GAGenerating a virtual vector space Z close to reality according to the random noise Z extracted in the step (1) and the sample label LaAnd input to a vector space discriminator DAAnd a sample space generator GB
(3) Vector space discriminator DAJudging whether the vector space Z input to the step (2) is the real vector space ZeOr virtual vector space ZaAnd feeds back the judgment result to the vector space discriminator DASum vector space generator GA
(4) Sample space generator GBAccording to the virtual vector space Z input in the step (2)aAnd (2) generating a virtual image by the sample label L extracted in the step (1), and inputting the virtual image to a sample space discriminator DBAnd, the real picture in step (1) is also input to the sample space discriminator DB
(5) Sample space discriminator DBJudging whether the image input in the step (4) is a real image or a virtual image, and feeding back the judgment result to a sample space discriminator DBAnd a sample space generator GB
As described in the background, the basic generation countermeasure network is actually the training process of the cost function V (D, G) based infinitesimal game of the generator G and the discriminator D:
Figure BDA0003125886380000032
wherein, PdataFor the feature distribution of the input sample, PzIs the distribution of random noise, x is the real picture, z is the random noise, Ez~PzDenotes z corresponds to PzDistribution of (E), Ex~PdataIndicates that x corresponds to PdataDistribution of (2).
However, by adding additional information to adjust the model, the generation process of the data can be guided. The generation of the countermeasure network can be popularized to a condition generation model, and the precondition generator and the discriminator both have certain additional information y. y may be any type of auxiliary information that is input to the arbiter and generator as a control condition by taking y as an additional input layer. The objective function of the conditional generation countermeasure network model is then:
Figure BDA0003125886380000041
that is, the training loss function formulas of all generators and discriminators involved in the present invention are based on the above theory and formulas.
Further, the self-encoder module E in step (2) is composed of an attention-based capsule network, and the architecture thereof is shown in fig. 2 in detail. The specific operation steps comprise:
(1.1) inputting a real picture from an input layer of an encoder module;
(1.2) carrying out parallel convolution operation on the real picture through the parallel convolution layer to obtain a characteristic diagram of the real picture;
(1.3) repeatedly extracting and compressing the information in the characteristic diagram in the step (1.2) through a self-attention layer, and outputting the result as a primary capsule layer;
(1.4) further compressing the primary capsule layer to a final capsule layer through compression operation, and enabling a real vector space Z formed by the final capsule layer to be a real vector space ZeInput direction of feedVolume space discriminator DA
The parallel convolution layers of the module respectively adopt convolution kernels of 3 x 3, 5 x 5, 7 x 7 and 9 x 9 to carry out parallel convolution operation, position information of a real picture on different resolutions can be obtained by non-parallel convolution operation of 4 different convolution kernels, and on the other hand, the model training speed can be accelerated, and parameters and complexity of a network are reduced. Obtaining 256 4 × 4 feature maps after parallel convolution layers, wherein each 64 feature maps are picture space position information obtained by convolution kernels with the same size, and the operation formula is as follows:
T=ReLU(Convk×k(x))
wherein x is a real picture, ReLU is an activation function, Conv is convolution operation, k represents the size of a convolution kernel, and T represents a feature diagram obtained after parallel convolution operation.
Then, 64 feature maps obtained by the same convolution kernel convolution operation are a group (4 groups in total), feature information extraction and compression are performed on the feature maps of each group n times through a self-Attention module (Attention), and a result feature matrix obtained each time is accumulated to obtain a primary capsule layer, so that the formula is as follows:
Tn=Attention(Tn-1)
Figure BDA0003125886380000042
wherein the Attention means the self-Attention module, TnRepresents the result obtained after the n-th extraction and compression, TnThe results obtained after the n-1 th extraction and compression are shown, i is 1, 2, …, n-1, n, and U is the primary capsule layer.
Then, the primary capsule layer is subjected to compression operation to obtain a final capsule layer to form an output layer ZeAnd high-level characteristics of the corresponding category, such as posture, direction, thickness and the like, are stored in each capsule in the final capsule layer. The compression operation formula is:
Figure BDA0003125886380000051
in the overall model, the task of the self-encoder module is to map the samples of each class into a potential vector space. The potential vector space can express the characteristics of each class sample to the maximum extent. To achieve this, the self-encoder model uses a method similar to training a classifier, i.e., minimizing the edge loss function LaTo train:
La=Tamax(0,m+-||va||)2+λ(1-Ta)max(0,||va||-m-)
Taindicating whether the predicted class is the current class, m+For the upper bound of the loss, 0.9 m is taken-To lower bound of loss, take 0.1, vaFor the representative vector of the a-th category, the length is taken to represent the existing probability, and finally, the ratio between the two is balanced by a hyperparameter lambda. Loss L from the encoderEFor the average loss for each class, the formula is as follows:
Figure BDA0003125886380000052
further, a vector space generator G in the vector space self-countermeasure moduleASum vector space discriminator DASee fig. 3 and 4 for details of their training loss function LGAAnd LDAThe formulas are respectively as follows:
Figure BDA0003125886380000053
Figure BDA0003125886380000054
where x is the real picture, z is random noise, PrIs a distribution of data samples, PzFor the distribution of random noise, E (x) is the result of the real picture passing through the self-encoder module, Ez~PzDenotes z corresponds to PzDistribution of (E), Ex~PrIndicates that x corresponds to PrDistribution of (2).
The vector space generator G in the step (2)AThe neural network comprises a vector space generator input layer, three vector space generator hidden layers and a linear output layer, wherein the three vector space generator hidden layers are all full-connection networks activated by a Leaky ReLU activation function, and the number of neurons is 512, 1024 and 512 respectively.
The vector space discriminator D in the step (3)AThe neural network comprises a vector space discriminator input layer, three vector space discriminator hidden layers and a nonlinear output layer, wherein the three vector space discriminator hidden layers are all fully-connected networks activated by a BN ReLU activation function, and the number of neurons is 512, 1024 and 512 respectively.
Further, a sample space generator G in the sample space self-confrontation moduleBAnd a sample space discriminator DBSee fig. 5 and 6 for details of their training loss function LGBAnd LDBThe formulas are respectively as follows:
Figure BDA0003125886380000061
Figure BDA0003125886380000062
where z is random noise.
The sample space generator G in the step (4)BThe system comprises a convolution layer with convolution kernel of 1 multiplied by 1, a mixed attention module and an deconvolution layer, wherein the mixed attention module comprises a channel attention module and a space attention module. The specific operation steps are as follows:
(2.1) virtual vector space ZaObtaining a characteristic diagram F through the convolution layer;
(2.2) carrying out tensor multiplication on the result obtained by the characteristic diagram F through the channel attention module and the characteristic diagram F to obtain the characteristic diagram Fc(ii) a Feature map FcObtained after passing through the spatial attention moduleResults, and feature map FcCarrying out tensor multiplication to obtain Fs
(2.3) carrying out tensor multiplication on the result obtained after the sample label L is subjected to reshape reconstruction calculation and the feature map F, and then sequentially carrying out tensor multiplication on the result and the feature map FcAnd feature map FsPerforming matrix addition to obtain a feature map F*
(2.4) feature map F*Obtaining a virtual image by the deconvolution layer, and inputting the virtual image to a sample space discriminator DB
The operation formula of the step (2.2) is as follows:
Figure BDA0003125886380000063
Figure BDA0003125886380000064
wherein the content of the first and second substances,
Figure BDA0003125886380000065
representing tensor multiplication, McIndicating the channel attention Module, MsThe spatial attention module is denoted and σ denotes the sigmoid function. The operation formula of the step (2.3) is
Figure BDA0003125886380000066
Wherein the content of the first and second substances,
Figure BDA0003125886380000071
representing a matrix addition.
The sample space discriminator D in the step (5)BThe device comprises a first-level convolution layer, a first-level pooling layer, a second-level convolution layer, a second-level pooling layer, a full-connection layer and a sample space discriminator output layer. The specific operation steps are as follows:
(3.1) sequentially passing the real picture or the virtual image through a first convolution layer and a first pooling layer to obtain a characteristic diagram y;
(3.2) the characteristic diagram y sequentially passes through the secondary convolution layer and the secondary pooling layer, and the obtained result passes through the output layer of the sample space discriminator to output a result yb
The operation formulas of the step (3.1) and the step (3.2) are respectively as follows:
y*=MaxPool(Conv(p)))
yb=σ(MaxPool(Conv(y*)))
wherein, MaxPool is the pooling operation, and p is the input real picture or virtual image. The first and second convolution pooling operations improve the accuracy and effectiveness of the discrimination.
The invention has the beneficial effects that: the self-encoder module based on self-attention and capsule network is applied to the construction of a real vector space, so that the accuracy and the efficiency of image characteristic information extraction are improved for an image generation task, and the number of required training samples is reduced; the method also adds a vector space self-countermeasure module and a sample space self-countermeasure module, improves the robustness of the whole framework and the definition and the reality of the finally generated virtual image; the sample space generator introduces a channel attention module and a space attention module in the feature mapping process, and focuses on information of different channels of the convolutional layer and space information of the convolutional layer respectively, so that the loss and the loss of the information in the model training process are reduced, and the stability and the accuracy of feature extraction are improved; the sample space discriminator uses LeNet-5 network for discrimination, wherein the first-stage and second-stage convolution pooling operations improve the accuracy of discrimination.
Drawings
FIG. 1 is a block diagram of a dual generative antagonistic learning method based on capsule networking and mixed attention;
FIG. 2 is attention auto-encoder E;
FIG. 3 is a potential vector space generator GA
FIG. 4 is a diagram of a potential vector space discriminator DA
FIG. 5 is a sample space generator GB
FIG. 6 is a sample space discriminator DB
Fig. 7 shows the results of comparison experiments performed by the present invention with other advanced warfare learning networks using MNIST data sets as examples.
Detailed Description
Example 1: the invention is further described with reference to the figures and training on the MNIST data set. A dual generation confrontation learning method based on capsule network and mixed attention comprises a self-encoder module E based on self-attention and capsule network, a vector space self-confrontation module and a sample space self-confrontation module;
the self-encoder module E based on the self-attention and capsule network comprises a self-encoder module input layer, a parallel convolution layer, a self-attention layer, a primary capsule layer and a final capsule layer;
vector space self-confrontation module comprising a vector space generator GASum vector space discriminator DA
Sample space self-confrontation module comprising a sample space generator GBAnd a sample space discriminator DB
Fig. 1 is a framework diagram of an operation flow according to an embodiment of the present invention, where the method includes the following steps:
(1) preprocessing a real picture input from an input layer of an encoder module, and randomly sampling and extracting random noise z and a sample label L from the characteristic distribution of the real picture;
(2) the self-encoder module E encodes the real picture to finally obtain a real vector space Z consisting of the final capsule layereAnd input to a vector space discriminator DAGenerator of vector space GAGenerating a virtual vector space Z close to reality according to the random noise Z extracted in the step (1) and the sample label LaAnd input to a vector space discriminator DAAnd a sample space generator GB
(3) Vector space discriminator DAJudging whether the vector space Z input to the step (2) is the real vector space ZeOr virtual vector space ZaAnd feeds back the judgment result to the vector space discriminator DASum vector space generator GA
(4) Sample space generator GBAccording to the virtual vector space Z input in the step (2)aAnd (2) generating a virtual image by the sample label L extracted in the step (1), and inputting the virtual image to a sample space discriminator DBAnd, the real picture in step (1) is also input to the sample space discriminator DB
(5) Sample space discriminator DBJudging whether the image input in the step (4) is a real image or a virtual image, and feeding back the judgment result to a sample space discriminator DBAnd a sample space generator GB
Further, the self-encoder module E in step (2) is based on self-attention and capsule network, and its architecture is shown in fig. 2. The self-encoder module is reasonably applied to the construction of a real vector space, so that the accuracy and the efficiency of extracting image characteristic information are improved for an image generation task, and the quantity of required training samples is reduced. The specific operation steps comprise:
(1.1) inputting a real picture from an input layer of an encoder module;
(1.2) carrying out parallel convolution operation on the real picture through the parallel convolution layer to obtain a characteristic diagram of the real picture;
(1.3) repeatedly extracting and compressing the information in the characteristic diagram in the step (1.2) through a self-attention layer, and outputting the result as a primary capsule layer;
(1.4) further compressing the primary capsule layer to a final capsule layer through a compression function squash, and enabling a real vector space Z consisting of the final capsule layer to be in a space ZeInput to vector space discriminator DA
In the step (1.2), the parallel convolution layers of the module respectively adopt convolution kernels of 3 × 3, 5 × 5, 7 × 7 and 9 × 9 to perform parallel convolution operation, and the position information of the real picture on different resolutions can be obtained by performing non-parallel convolution operation on 4 different convolution kernels, so that the model training speed can be increased, and the parameters and the complexity of the network can be reduced. Obtaining 256 4 × 4 feature maps after parallel convolution layers, wherein each 64 feature maps are picture space position information obtained by convolution kernels with the same size, and the operation formula is as follows:
T=ReLU(Convk×k(x))
wherein x is a real picture, ReLU is an activation function, Conv is convolution operation, k represents the size of a convolution kernel, and T represents a feature diagram obtained after parallel convolution operation.
Then, in step (1.3), 64 feature maps obtained through the same convolution kernel convolution operation are a group (4 groups in total), feature information extraction and compression are performed on the feature maps of each group n times through a self-Attention module (Attention), and a result feature matrix obtained each time is accumulated to obtain a primary capsule layer, so that the formula is as follows:
Tn=Attention(Tn-1)
Figure BDA0003125886380000091
wherein the Attention means the self-Attention module, TnRepresents the result obtained after the n-th extraction and compression, TnThe results obtained after the n-1 th extraction and compression are shown, i is 1, 2, …, n-1, n, and U is the primary capsule layer.
Then, in the step (1.4), the primary capsule layer is subjected to compression operation to obtain a final capsule layer to form an output layer ZeAnd high-level characteristics of the corresponding category, such as posture, direction, thickness and the like, are stored in each capsule in the final capsule layer. The compression operation formula is:
Figure BDA0003125886380000092
in the overall model, the task of the self-encoder module is to map the samples of each class into a potential vector space. The potential vector space can express the characteristics of each class sample to the maximum extent. To achieve this, the self-encoder model uses a method similar to training a classifier, i.e., minimizing the edge loss function LaTo train:
La=Tamax(0,m+-||va||)2+λ(1-Ta)max(0,||va||-m-)
Taindicating whether the predicted class is the current class, m+For the upper bound of the loss, 0.9 m is taken-To lower bound of loss, take 0.1, vaFor the representative vector of the a-th category, the length is taken to represent the existing probability, and finally, the ratio between the two is balanced by a hyperparameter lambda. Loss L from the encoderEFor the average loss for each class, the formula is as follows:
Figure BDA0003125886380000101
the invention also adds a vector space self-countermeasure module and a sample space self-countermeasure module, and improves the robustness of the whole framework and the definition and the reality of the finally generated virtual image. And, the basic architecture of the sample space generator and the sample space discriminator is improved: the sample space generator introduces a channel attention module and a space attention module in the feature mapping process, and focuses on information of different channels of the convolutional layer and space information of the convolutional layer respectively, so that the loss and the loss of the information in the model training process are reduced, and the stability and the accuracy of feature extraction are improved; the sample space discriminator uses LeNet-5 network for discrimination, wherein the first-stage and second-stage convolution pooling operations aim to improve the accuracy of discrimination.
Further, the vector space generator G in step (2)AThe neural network comprises a vector space generator input layer, three vector space generator hidden layers and a linear output layer, wherein the three vector space generator hidden layers are all full-connection networks activated by a Leaky ReLU activation function, and the number of neurons is 512, 1024 and 512 respectively.
The vector space discriminator D in the step (3)AThe device comprises a vector space discriminator input layer, three vector space discriminator hidden layers and a nonlinear output layer, wherein the three vector space discriminator hidden layers are all a full-connection network activated by a BN ReLU activation function, and neurons are arrangedThe numbers are 512, 1024 and 512 respectively.
Vector space generator G in vector space self-countermeasure moduleASum vector space discriminator DASee fig. 3 and 4 for details of their training loss function LGAAnd LDAThe formulas are respectively as follows:
Figure BDA0003125886380000102
Figure BDA0003125886380000103
where x is the real picture, z is random noise, PrIs a distribution of data samples, PzFor the distribution of random noise, E (x) is the result of the real picture passing through the self-encoder module, Ez~PzDenotes z corresponds to PzDistribution of (E), Ex~PrIndicates that x corresponds to PrDistribution of (2).
Further, the sample space generator G in step (4)BThe system comprises a convolution layer with convolution kernel of 1 multiplied by 1, a mixed attention module and an deconvolution layer, wherein the mixed attention module comprises a channel attention module and a space attention module. The specific operation steps are as follows:
(2.1) virtual vector space ZaObtaining a characteristic diagram F through the convolution layer;
(2.2) carrying out tensor multiplication on the result obtained by the characteristic diagram F through the channel attention module and the characteristic diagram F to obtain the characteristic diagram Fc(ii) a Feature map FcThe results obtained by the spatial attention module, together with the feature map FcCarrying out tensor multiplication to obtain Fs
(2.3) carrying out tensor multiplication on the result obtained after the sample label L is subjected to reshape reconstruction calculation and the feature map F, and then sequentially carrying out tensor multiplication on the result and the feature map FcAnd feature map FsPerforming matrix addition to obtain a feature map F*
(2.4) feature map F*Obtaining a virtual image by the deconvolution layer, and inputting the virtual image to the sampleThis space discriminator DB
The operation formula of the step (2.2) is as follows:
Figure BDA0003125886380000111
Figure BDA0003125886380000112
wherein the content of the first and second substances,
Figure BDA0003125886380000113
representing tensor multiplication, McIndicating the channel attention Module, MsThe spatial attention module is denoted and σ denotes the sigmoid function. The operation formula of the step (2.3) is
Figure BDA0003125886380000114
Wherein the content of the first and second substances,
Figure BDA0003125886380000115
representing a matrix addition.
The sample space discriminator D in the step (5)BThe device comprises a first-level convolution layer, a first-level pooling layer, a second-level convolution layer, a second-level pooling layer, a full-connection layer and a sample space discriminator output layer. The specific operation steps are as follows:
(3.1) the real picture or the virtual image sequentially passes through the first-level convolution layer and the first-level pooling layer to obtain a characteristic diagram y*
(3.2) feature map y*Sequentially passing through a second convolution layer and a second pooling layer, and outputting the obtained result y through an output layer of a sample space discriminatorb
The operation formulas of the step (3.1) and the step (3.2) are respectively as follows:
y*=MaxPool(Conv(p)))
yb=σ(MaxPool(Conv(y*)))
wherein, MaxPool is the pooling operation, and p is the input real picture or virtual image. The first and second convolution pooling operations improve the accuracy and effectiveness of the discrimination.
Sample space generator G in sample space self-confrontation moduleBAnd a sample space discriminator DBSee fig. 5 and 6 for details of their training loss function LGBAnd LDBThe formulas are respectively as follows:
Figure BDA0003125886380000121
Figure BDA0003125886380000122
where z is random noise.
Finally, the final training loss function L of the overall model is:
Figure BDA0003125886380000123
the invention has wide application fields, for example, when the classification problem on a production line is processed, the number of samples provided by manufacturers is often insufficient, and a new sample image is required to be correspondingly generated according to the existing samples. The method has low requirement on the number of samples, and the added self-attention module reduces the number of capsules in a capsule network, even reduces the number of capsules to 2% of the original number in efficiency-CapsNet, thereby greatly improving the working efficiency of an encoder; the method has more control, pertinence and accuracy on the generation of the image, and can generate the image similar to a sample provided by a manufacturer better and more clearly.
In the experimental process, a system Ubuntu 18.04 is used, a hardware CPU is AMD Ryzen 52600Six-Core Processor 3.85GHz, a programming language is Python 3.6, a video card is Inviet GeForce RTX 2070super, and a deep learning frame is Pyorch 1.2. The MNIST data set is divided into a training set and a testing set, each of which comprises 10 classes of handwritten numbers, the range of the handwritten numbers is 0 to 9, namely sample labels are obtained, and the size of each sample image is 28 pixels by 28 pixels. The results of the comparative experiments conducted by the present invention and other advanced antagonistic learning networks using the MNIST dataset as an example are shown in fig. 7, and the evaluation parameters of the comparative experiments are as follows:
Figure BDA0003125886380000124
Figure BDA0003125886380000131
wherein: IS the inclusion Score, used to assess the quality (sharpness) of the generated image, with larger values being better; the FID is Frechet inclusion Distance and is used for evaluating the diversity of generated images, and the smaller the value, the better the value.
In summary, the dual generation antagonistic learning method based on the capsule network and the mixed attention according to the embodiment of the invention is a novel dual antagonistic learning method with self attention, capsule network and mixed attention. Unlike previous methods, the method uses a hybrid attention module to weight the extracted features so that the impact of the non-transferable features can be effectively clarified. The method further extracts complex multimodal structure information from the features extracted from the whole image by considering the transferability of different regions or resolutions so as to realize finer image generation. Various comparison experiments and ablation experiments on the reference data set also show the feasibility and effectiveness of the method.
While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit and scope of the present invention.

Claims (8)

1. A dual generation confrontation learning method based on capsule network and mixed attention is characterized in that: the system comprises a self-encoder module E based on self-attention and capsule network, a vector space self-confrontation module and a sample space self-confrontation module;
the self-encoder module E based on the self-attention and capsule network comprises a self-encoder module input layer, a parallel convolution layer, a self-attention layer, a primary capsule layer and a final capsule layer;
vector space self-confrontation module comprising a vector space generator GASum vector space discriminator DA
Sample space self-confrontation module comprising a sample space generator GBAnd a sample space discriminator DB
The method comprises the following specific steps:
(1) preprocessing a real picture input from an input layer of an encoder module, and randomly sampling and extracting random noise z and a sample label L from the characteristic distribution of the real picture;
(2) the self-encoder module E encodes the real picture to finally obtain a real vector space Z consisting of the final capsule layereAnd input to a vector space discriminator DAGenerator of vector space GAGenerating a virtual vector space Z close to reality according to the random noise Z extracted in the step (1) and the sample label LaAnd input to a vector space discriminator DAAnd a sample space generator GB
(3) Vector space discriminator DAJudging whether the vector space Z input to the step (2) is the real vector space ZeOr virtual vector space ZaAnd feeds back the judgment result to the vector space discriminator DASum vector space generator GA
(4) Sample space generator GBAccording to the virtual vector space Z input in the step (2)aAnd (2) generating a virtual image by the sample label L extracted in the step (1), and inputting the virtual image to a sample space discriminator DBAnd, the real picture in step (1) is also input to the sample space discriminator DB
(5) Sample space discriminator DBInput to itself in the judgment step (4)Whether it is a real picture or a virtual image, and feeds back the determination result to the sample space discriminator DBAnd a sample space generator GB
2. The dual generation antagonistic learning method based on capsule networking and mixed attention of claim 1, characterized in that: the self-encoder module E based on self-attention and capsule network specifically comprises the following operation steps:
(1.1) inputting a real picture from an input layer of an encoder module;
(1.2) carrying out parallel convolution operation on the real picture through the parallel convolution layer to obtain a characteristic diagram of the real picture;
(1.3) repeatedly extracting and compressing the information in the characteristic diagram in the step (1.2) through a self-attention layer, and outputting the result as a primary capsule layer;
(1.4) further compressing the primary capsule layer to a final capsule layer through a compression operation S, and enabling a real vector space Z formed by the final capsule layer to be a real vector space ZeInput to vector space discriminator DA
In the step (1.2), 3 × 3, 5 × 5, 7 × 7 and 9 × 9 convolution kernels are respectively adopted to perform parallel convolution operation, and the operation formula is as follows:
T=ReLU(Convk×k(x))
wherein x is a real picture, ReLU is an activation function, Conv is convolution operation, k represents the size of a convolution kernel, and T represents a feature diagram obtained by parallel convolution operation in the step (1.2);
in step (1.3), the operation formula for repeatedly extracting and compressing the feature map obtained in step (1.2) from the attention layer is as follows:
Tn=Attention(Tn-1)
Figure FDA0003125886370000021
wherein the Attention means the self-Attention module, TnRepresents the result obtained after the n-th extraction and compression, TnIndicates the (n-1) th extraction and pressureThe results obtained after shrinkage, i ═ 1, 2, …, n-1, n, U denotes the primary capsule layer;
the compression operation formula in step (1.4) is as follows:
Figure FDA0003125886370000022
3. the dual generation antagonistic learning method based on capsule networking and mixed attention of claim 1, characterized in that: vector space generator G of vector space self-countermeasure moduleASum vector space discriminator DATraining loss function LGAAnd LDAThe formulas are respectively as follows:
Figure FDA0003125886370000023
Figure FDA0003125886370000024
where x is the real picture, z is random noise, PrIs a distribution of data samples, PzFor the distribution of random noise, E (x) is the result of the real picture passing through the self-encoder module, Ez~PzDenotes z corresponds to PzDistribution of (E), Ex~PrIndicates that x corresponds to PrDistribution of (2).
4. The dual generation antagonistic learning method based on capsule networking and mixed attention of claim 1, characterized in that: vector space generator GAThe neural network comprises a vector space generator input layer, three vector space generator hidden layers and a linear output layer, wherein the three vector space generator hidden layers are all full-connection networks activated by a Leaky ReLU activation function, and the number of neurons is 512, 1024 and 512 respectively.
5. The dual generation antagonistic learning method based on capsule networking and mixed attention of claim 1, characterized in that: vector space discriminator DAThe neural network comprises a vector space discriminator input layer, three vector space discriminator hidden layers and a nonlinear output layer, wherein the three vector space discriminator hidden layers are all fully-connected networks activated by a BN ReLU activation function, and the number of neurons is 512, 1024 and 512 respectively.
6. The dual generation antagonistic learning method based on capsule networking and mixed attention of claim 1, characterized in that: sample space generator GB of sample space self-countermeasure module and training loss function L of sample space discriminator DBGBAnd LDBThe formulas are respectively as follows:
Figure FDA0003125886370000031
Figure FDA0003125886370000032
where x is the real picture, z is random noise, PrIs a distribution of data samples, PzFor the distribution of random noise, Ez~PzDenotes z corresponds to PzDistribution of (E), Ex~PrIndicates that x corresponds to PrDistribution of (2).
7. The dual generation antagonistic learning method based on capsule networking and mixed attention of claim 1, characterized in that: sample space generator GBThe method comprises a convolution layer with convolution kernel of 1 multiplied by 1, a mixed attention module and an deconvolution layer, wherein the mixed attention module comprises a channel attention module and a space attention module, and the specific operation steps are as follows:
(2.1) virtual vector space ZaObtaining a characteristic diagram F through the convolution layer;
(2.2) feature map F is obtained by the channel attention moduleThe obtained result is subjected to tensor multiplication with the characteristic diagram F to obtain the characteristic diagram Fc(ii) a Feature map FcThe results obtained by the spatial attention module, together with the feature map FcCarrying out tensor multiplication to obtain Fs
(2.3) carrying out tensor multiplication on the result obtained after the sample label L is subjected to reshape reconstruction calculation and the feature map F, and then sequentially carrying out tensor multiplication on the result and the feature map FcAnd feature map FsPerforming matrix addition to obtain a feature map F*
(2.4) feature map F*Obtaining a virtual image by the deconvolution layer, and inputting the virtual image to a sample space discriminator DB
The operation formula of the step (2.2) is as follows:
Figure FDA0003125886370000033
Figure FDA0003125886370000034
wherein the content of the first and second substances,
Figure FDA0003125886370000041
representing tensor multiplication, McIndicating the channel attention Module, MsRepresenting a spatial attention module, and sigma representing a sigmoid function;
the operation formula of the step (2.3) is
Figure FDA0003125886370000042
Wherein the content of the first and second substances,
Figure FDA0003125886370000043
representing a matrix addition.
8. The capsule network and mixed attention based of claim 1The dual generation antagonistic learning method is characterized in that: sample space discriminator DBIncluding one-level convolution layer, one-level pooling layer, second grade convolution layer, second grade pooling layer, full tie layer and sample space discriminator output layer, concrete operation step is:
(3.1) the real picture or the virtual image sequentially passes through the first-level convolution layer and the first-level pooling layer to obtain a characteristic diagram y*
(3.2) feature map y*Sequentially passing through a second convolution layer and a second pooling layer, and outputting the obtained result y through an output layer of a sample space discriminatorb
The operation formulas of the step (3.1) and the step (3.2) are respectively as follows:
y*=MaxPool(Conv(p)))
yb=σ(MaxPool(Conv(y*)))
wherein, MaxPool is the pooling operation, and p is the input real picture or virtual image.
CN202110690163.7A 2021-06-22 2021-06-22 Dual-generation confrontation learning method based on capsule network and mixed attention Pending CN113378949A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110690163.7A CN113378949A (en) 2021-06-22 2021-06-22 Dual-generation confrontation learning method based on capsule network and mixed attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110690163.7A CN113378949A (en) 2021-06-22 2021-06-22 Dual-generation confrontation learning method based on capsule network and mixed attention

Publications (1)

Publication Number Publication Date
CN113378949A true CN113378949A (en) 2021-09-10

Family

ID=77578325

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110690163.7A Pending CN113378949A (en) 2021-06-22 2021-06-22 Dual-generation confrontation learning method based on capsule network and mixed attention

Country Status (1)

Country Link
CN (1) CN113378949A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110633655A (en) * 2019-08-29 2019-12-31 河南中原大数据研究院有限公司 Attention-attack face recognition attack algorithm
CN113780468A (en) * 2021-09-28 2021-12-10 中国人民解放军国防科技大学 Robust model training method based on small number of neuron connections
CN115937994A (en) * 2023-01-06 2023-04-07 南昌大学 Data detection method based on deep learning detection model

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110633655A (en) * 2019-08-29 2019-12-31 河南中原大数据研究院有限公司 Attention-attack face recognition attack algorithm
CN113780468A (en) * 2021-09-28 2021-12-10 中国人民解放军国防科技大学 Robust model training method based on small number of neuron connections
CN113780468B (en) * 2021-09-28 2022-08-09 中国人民解放军国防科技大学 Robust image classification model training method based on small number of neuron connections
CN115937994A (en) * 2023-01-06 2023-04-07 南昌大学 Data detection method based on deep learning detection model

Similar Documents

Publication Publication Date Title
CN108537743B (en) Face image enhancement method based on generation countermeasure network
CN113378949A (en) Dual-generation confrontation learning method based on capsule network and mixed attention
CN113674140B (en) Physical countermeasure sample generation method and system
CN110543846A (en) Multi-pose face image obverse method based on generation countermeasure network
CN110175248B (en) Face image retrieval method and device based on deep learning and Hash coding
CN112036260B (en) Expression recognition method and system for multi-scale sub-block aggregation in natural environment
CN113011357A (en) Depth fake face video positioning method based on space-time fusion
Wang et al. Sketchembednet: Learning novel concepts by imitating drawings
Singh et al. Steganalysis of digital images using deep fractal network
CN109800768B (en) Hash feature representation learning method of semi-supervised GAN
CN111476133B (en) Unmanned driving-oriented foreground and background codec network target extraction method
CN113642621A (en) Zero sample image classification method based on generation countermeasure network
CN116311483B (en) Micro-expression recognition method based on local facial area reconstruction and memory contrast learning
Hongmeng et al. A detection method for deepfake hard compressed videos based on super-resolution reconstruction using CNN
CN114724189A (en) Method, system and application for training confrontation sample defense model for target recognition
CN109508640A (en) Crowd emotion analysis method and device and storage medium
Sharma et al. Deepfakes Classification of Faces Using Convolutional Neural Networks.
Hoque et al. Bdsl36: A dataset for bangladeshi sign letters recognition
CN114170659A (en) Facial emotion recognition method based on attention mechanism
CN112560668A (en) Human behavior identification method based on scene prior knowledge
CN113658285B (en) Method for generating face photo to artistic sketch
CN115294424A (en) Sample data enhancement method based on generation countermeasure network
CN110188706B (en) Neural network training method and detection method based on character expression in video for generating confrontation network
Ge et al. Multi-grained cascade adaboost extreme learning machine for feature representation
Zhang Detect forgery video by performing transfer learning on deep neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination