CN113378949A - Dual-generation confrontation learning method based on capsule network and mixed attention - Google Patents
Dual-generation confrontation learning method based on capsule network and mixed attention Download PDFInfo
- Publication number
- CN113378949A CN113378949A CN202110690163.7A CN202110690163A CN113378949A CN 113378949 A CN113378949 A CN 113378949A CN 202110690163 A CN202110690163 A CN 202110690163A CN 113378949 A CN113378949 A CN 113378949A
- Authority
- CN
- China
- Prior art keywords
- layer
- vector space
- attention
- self
- discriminator
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 239000002775 capsule Substances 0.000 title claims abstract description 77
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000012549 training Methods 0.000 claims abstract description 33
- 230000009977 dual effect Effects 0.000 claims abstract description 16
- 230000006870 function Effects 0.000 claims description 30
- 238000010586 diagram Methods 0.000 claims description 27
- 238000011176 pooling Methods 0.000 claims description 20
- 230000006835 compression Effects 0.000 claims description 15
- 238000007906 compression Methods 0.000 claims description 15
- 238000000605 extraction Methods 0.000 claims description 14
- 230000003042 antagnostic effect Effects 0.000 claims description 12
- 239000011159 matrix material Substances 0.000 claims description 10
- 230000004913 activation Effects 0.000 claims description 9
- 238000013528 artificial neural network Methods 0.000 claims description 7
- 230000006855 networking Effects 0.000 claims description 7
- 210000002569 neuron Anatomy 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 238000012545 processing Methods 0.000 abstract description 2
- 230000008569 process Effects 0.000 description 9
- 238000002474 experimental method Methods 0.000 description 5
- 238000010276 construction Methods 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 230000000052 comparative effect Effects 0.000 description 2
- FQZYTYWMLGAPFJ-OQKDUQJOSA-N tamoxifen citrate Chemical compound [H+].[H+].[H+].[O-]C(=O)CC(O)(CC([O-])=O)C([O-])=O.C=1C=CC=CC=1C(/CC)=C(C=1C=CC(OCCN(C)C)=CC=1)/C1=CC=CC=C1 FQZYTYWMLGAPFJ-OQKDUQJOSA-N 0.000 description 2
- 235000009854 Cucurbita moschata Nutrition 0.000 description 1
- 240000001980 Cucurbita pepo Species 0.000 description 1
- 235000009852 Cucurbita pepo Nutrition 0.000 description 1
- 238000013256 Gubra-Amylin NASH model Methods 0.000 description 1
- 238000002679 ablation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 235000020354 squash Nutrition 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a dual generation confrontation learning method based on a capsule network and mixed attention, and belongs to the field of artificial intelligence and image processing. The invention relates to an image generation method combining capsule network, self-attention, mixed attention and confrontation learning. And the sample space generator in the sample space self-countermeasure module is a depth generation model based on mixed attention, and the LeNet-5 network is used for reference by the sample space discriminator. The invention improves the operation accuracy and the working efficiency of the counterstudy in the task of generating the clear images by training few samples, reduces the training time and the number of training samples, has stronger generalization capability and verifies the effectiveness of the model on MNIST and other reference data sets.
Description
Technical Field
The invention relates to a method for generating a clear image by using few samples, in particular to a dual generation counterstudy method based on a capsule network and mixed attention, and belongs to the field of artificial intelligence and image processing.
Background
Image generation is an important issue in computer vision. In computer vision research over the years, attention mechanisms have been extensively studied and have been used to improve the performance of modern deep neural networks. Attention mechanisms have proven useful in a variety of computer vision tasks, such as image generation and image classification.
Much recent work has proposed using either channel attention or spatial attention, or both, to improve the performance of these neural networks. These attention mechanisms have the function of improving the representation of features generated by standard convolutional layers by establishing correlations between channels or weighted spaces (for spatial attention). The intuition behind learning attention weights is to enable the network to learn where to engage in training and to focus further on the target object. This idea is further advanced by introducing a convolution with a kernel of large size to introduce spatial information encoding.
Hinton et al in 2017 proposed a capsule network and a dynamic routing algorithm that was applied to the primary capsule to digital capsule update and successfully applied to the identification of MNIST datasets. Then, a matrix capsule structure is proposed, which adopts a matrix to express the posture between objects, and adopts an EM algorithm to perform dynamic update between capsules. Recently, the efficiency-capsNet of the attention module is added between the capsule layers, so that the number of primary capsules in the capsule network is reduced to 2% of the original number, and the effectiveness of the attention module on the improvement of the capsule network performance is proved.
At the same time, with the advent of the generation of the antagonistic learning network GAN, a significant advance has been made in the direction of the task of image generation (Goodfellow et al, 2014), but there are still many problems unsolved. GAN based on deep convolutional networks is particularly successful. However, by careful inspection of these generated samples, while the advanced ImageNet GAN model is adept at generating image classes with less structural constraints (e.g., ocean, sky, and landscape classes, which are more differentiated by texture than geometry), it cannot capture geometric or structural patterns that persist in certain classes.
The generation of the countermeasure network comprises two models, generator G and discriminator D respectively, whose training is performed simultaneously: maximizing the probability of correct labeling of training samples and samples from G by training D; while the parameters of generator G are adjusted by minimizing log (1-D (G (z)).
In unconditional generators, the pattern in which the data is generated is not controllable. However, in the case of data tagged, it may be convenient to use the tag as a condition for generating network input, such as CGAN. The idea is to decompose the noise source into an incompressible source and a potential code by a variational self-encoder, trying to find the potential factors of the feature variations by maximizing the interaction information of the variational self-encoder and the generator. Such latent coding can be used to discover object classes in an unsupervised manner, with learned representations having rich semantic information that can handle complex, inter-interlaced factors in image appearance, including pose, lighting, and changes in the emotional content of facial images.
Variational autoencoders and generative countermeasure networks have matured increasingly, but they all have their own advantages and disadvantages. The variational self-encoder is high in training speed and stable, but the generated image is fuzzy; the problems of unstable training, mode collapse, incomplete extraction of intermediate characteristic information, loss of characteristic information and the like often exist in the generation of the countermeasure network. Capsule networks are also always used for classification tasks, but their effectiveness in generating tasks is well documented. In future work, the above-mentioned several deep network models aim at better mutual combination, making up for the weakness, and thus can break their respective limitations.
Disclosure of Invention
The invention aims to provide a dual generation confrontation learning method based on capsule network and mixed attention in the task of generating clear images by training few samples, aiming at the defects and shortcomings of the prior art.
The technical scheme adopted by the invention is as follows: a dual generation confrontation learning method based on capsule network and mixed attention comprises a self-encoder module E based on self-attention and capsule network, a vector space self-confrontation module and a sample space self-confrontation module;
the self-encoder module E based on the self-attention and capsule network comprises a self-encoder module input layer, a parallel convolution layer, a self-attention layer, a primary capsule layer and a final capsule layer;
vector space self-confrontation module comprising a vector space generator GASum vector space discriminator DA;
Sample space self-confrontation module comprising a sample space generator GBAnd a sample space discriminator DB;
On the premise of generating a basic confrontation learning model, the self-encoder module based on self-attention and capsule network is applied to the construction of a real vector space, so that the accuracy and the efficiency of image feature information extraction are improved for an image generation task, and the number of required training samples is reduced. The invention also adds a vector space self-countermeasure module and a sample space self-countermeasure module, and improves the robustness of the whole framework and the definition and the reality of the finally generated virtual image.
And, the basic architecture of the sample space generator and the sample space discriminator is improved: the sample space generator introduces a channel attention module and a space attention module in the feature mapping process, and focuses on information of different channels of the convolutional layer and space information of the convolutional layer respectively, so that the loss and the loss of the information in the model training process are reduced, and the stability and the accuracy of feature extraction are improved; the sample space discriminator uses LeNet-5 network for discrimination, wherein the first-stage and second-stage convolution pooling operations aim to improve the accuracy of discrimination.
The overall method architecture is shown in fig. 1, and the total training loss function L is:
LEas a loss function from the encoder, LGAAnd LDAAre respectively vector space generators GAAnd a discriminator DALoss function of LGBAnd LDBFor a sample space generator GBAnd a discriminator DBThe training loss function of (1). The method comprises the following specific steps:
(1) preprocessing a real picture input from an input layer of an encoder module, and randomly sampling and extracting random noise z and a sample label L from the characteristic distribution of the real picture;
(2) the self-encoder module E encodes the real picture to finally obtain a real vector space Z consisting of the final capsule layereAnd input to a vector space discriminator DAGenerator of vector space GAGenerating a virtual vector space Z close to reality according to the random noise Z extracted in the step (1) and the sample label LaAnd input to a vector space discriminator DAAnd a sample space generator GB;
(3) Vector space discriminator DAJudging whether the vector space Z input to the step (2) is the real vector space ZeOr virtual vector space ZaAnd feeds back the judgment result to the vector space discriminator DASum vector space generator GA;
(4) Sample space generator GBAccording to the virtual vector space Z input in the step (2)aAnd (2) generating a virtual image by the sample label L extracted in the step (1), and inputting the virtual image to a sample space discriminator DBAnd, the real picture in step (1) is also input to the sample space discriminator DB;
(5) Sample space discriminator DBJudging whether the image input in the step (4) is a real image or a virtual image, and feeding back the judgment result to a sample space discriminator DBAnd a sample space generator GB。
As described in the background, the basic generation countermeasure network is actually the training process of the cost function V (D, G) based infinitesimal game of the generator G and the discriminator D:
wherein, PdataFor the feature distribution of the input sample, PzIs the distribution of random noise, x is the real picture, z is the random noise, Ez~PzDenotes z corresponds to PzDistribution of (E), Ex~PdataIndicates that x corresponds to PdataDistribution of (2).
However, by adding additional information to adjust the model, the generation process of the data can be guided. The generation of the countermeasure network can be popularized to a condition generation model, and the precondition generator and the discriminator both have certain additional information y. y may be any type of auxiliary information that is input to the arbiter and generator as a control condition by taking y as an additional input layer. The objective function of the conditional generation countermeasure network model is then:
that is, the training loss function formulas of all generators and discriminators involved in the present invention are based on the above theory and formulas.
Further, the self-encoder module E in step (2) is composed of an attention-based capsule network, and the architecture thereof is shown in fig. 2 in detail. The specific operation steps comprise:
(1.1) inputting a real picture from an input layer of an encoder module;
(1.2) carrying out parallel convolution operation on the real picture through the parallel convolution layer to obtain a characteristic diagram of the real picture;
(1.3) repeatedly extracting and compressing the information in the characteristic diagram in the step (1.2) through a self-attention layer, and outputting the result as a primary capsule layer;
(1.4) further compressing the primary capsule layer to a final capsule layer through compression operation, and enabling a real vector space Z formed by the final capsule layer to be a real vector space ZeInput direction of feedVolume space discriminator DA。
The parallel convolution layers of the module respectively adopt convolution kernels of 3 x 3, 5 x 5, 7 x 7 and 9 x 9 to carry out parallel convolution operation, position information of a real picture on different resolutions can be obtained by non-parallel convolution operation of 4 different convolution kernels, and on the other hand, the model training speed can be accelerated, and parameters and complexity of a network are reduced. Obtaining 256 4 × 4 feature maps after parallel convolution layers, wherein each 64 feature maps are picture space position information obtained by convolution kernels with the same size, and the operation formula is as follows:
T=ReLU(Convk×k(x))
wherein x is a real picture, ReLU is an activation function, Conv is convolution operation, k represents the size of a convolution kernel, and T represents a feature diagram obtained after parallel convolution operation.
Then, 64 feature maps obtained by the same convolution kernel convolution operation are a group (4 groups in total), feature information extraction and compression are performed on the feature maps of each group n times through a self-Attention module (Attention), and a result feature matrix obtained each time is accumulated to obtain a primary capsule layer, so that the formula is as follows:
Tn=Attention(Tn-1)
wherein the Attention means the self-Attention module, TnRepresents the result obtained after the n-th extraction and compression, TnThe results obtained after the n-1 th extraction and compression are shown, i is 1, 2, …, n-1, n, and U is the primary capsule layer.
Then, the primary capsule layer is subjected to compression operation to obtain a final capsule layer to form an output layer ZeAnd high-level characteristics of the corresponding category, such as posture, direction, thickness and the like, are stored in each capsule in the final capsule layer. The compression operation formula is:
in the overall model, the task of the self-encoder module is to map the samples of each class into a potential vector space. The potential vector space can express the characteristics of each class sample to the maximum extent. To achieve this, the self-encoder model uses a method similar to training a classifier, i.e., minimizing the edge loss function LaTo train:
La=Tamax(0,m+-||va||)2+λ(1-Ta)max(0,||va||-m-)
Taindicating whether the predicted class is the current class, m+For the upper bound of the loss, 0.9 m is taken-To lower bound of loss, take 0.1, vaFor the representative vector of the a-th category, the length is taken to represent the existing probability, and finally, the ratio between the two is balanced by a hyperparameter lambda. Loss L from the encoderEFor the average loss for each class, the formula is as follows:
further, a vector space generator G in the vector space self-countermeasure moduleASum vector space discriminator DASee fig. 3 and 4 for details of their training loss function LGAAnd LDAThe formulas are respectively as follows:
where x is the real picture, z is random noise, PrIs a distribution of data samples, PzFor the distribution of random noise, E (x) is the result of the real picture passing through the self-encoder module, Ez~PzDenotes z corresponds to PzDistribution of (E), Ex~PrIndicates that x corresponds to PrDistribution of (2).
The vector space generator G in the step (2)AThe neural network comprises a vector space generator input layer, three vector space generator hidden layers and a linear output layer, wherein the three vector space generator hidden layers are all full-connection networks activated by a Leaky ReLU activation function, and the number of neurons is 512, 1024 and 512 respectively.
The vector space discriminator D in the step (3)AThe neural network comprises a vector space discriminator input layer, three vector space discriminator hidden layers and a nonlinear output layer, wherein the three vector space discriminator hidden layers are all fully-connected networks activated by a BN ReLU activation function, and the number of neurons is 512, 1024 and 512 respectively.
Further, a sample space generator G in the sample space self-confrontation moduleBAnd a sample space discriminator DBSee fig. 5 and 6 for details of their training loss function LGBAnd LDBThe formulas are respectively as follows:
where z is random noise.
The sample space generator G in the step (4)BThe system comprises a convolution layer with convolution kernel of 1 multiplied by 1, a mixed attention module and an deconvolution layer, wherein the mixed attention module comprises a channel attention module and a space attention module. The specific operation steps are as follows:
(2.1) virtual vector space ZaObtaining a characteristic diagram F through the convolution layer;
(2.2) carrying out tensor multiplication on the result obtained by the characteristic diagram F through the channel attention module and the characteristic diagram F to obtain the characteristic diagram Fc(ii) a Feature map FcObtained after passing through the spatial attention moduleResults, and feature map FcCarrying out tensor multiplication to obtain Fs;
(2.3) carrying out tensor multiplication on the result obtained after the sample label L is subjected to reshape reconstruction calculation and the feature map F, and then sequentially carrying out tensor multiplication on the result and the feature map FcAnd feature map FsPerforming matrix addition to obtain a feature map F*;
(2.4) feature map F*Obtaining a virtual image by the deconvolution layer, and inputting the virtual image to a sample space discriminator DB。
The operation formula of the step (2.2) is as follows:
wherein,representing tensor multiplication, McIndicating the channel attention Module, MsThe spatial attention module is denoted and σ denotes the sigmoid function. The operation formula of the step (2.3) is
The sample space discriminator D in the step (5)BThe device comprises a first-level convolution layer, a first-level pooling layer, a second-level convolution layer, a second-level pooling layer, a full-connection layer and a sample space discriminator output layer. The specific operation steps are as follows:
(3.1) sequentially passing the real picture or the virtual image through a first convolution layer and a first pooling layer to obtain a characteristic diagram y;
(3.2) the characteristic diagram y sequentially passes through the secondary convolution layer and the secondary pooling layer, and the obtained result passes through the output layer of the sample space discriminator to output a result yb。
The operation formulas of the step (3.1) and the step (3.2) are respectively as follows:
y*=MaxPool(Conv(p)))
yb=σ(MaxPool(Conv(y*)))
wherein, MaxPool is the pooling operation, and p is the input real picture or virtual image. The first and second convolution pooling operations improve the accuracy and effectiveness of the discrimination.
The invention has the beneficial effects that: the self-encoder module based on self-attention and capsule network is applied to the construction of a real vector space, so that the accuracy and the efficiency of image characteristic information extraction are improved for an image generation task, and the number of required training samples is reduced; the method also adds a vector space self-countermeasure module and a sample space self-countermeasure module, improves the robustness of the whole framework and the definition and the reality of the finally generated virtual image; the sample space generator introduces a channel attention module and a space attention module in the feature mapping process, and focuses on information of different channels of the convolutional layer and space information of the convolutional layer respectively, so that the loss and the loss of the information in the model training process are reduced, and the stability and the accuracy of feature extraction are improved; the sample space discriminator uses LeNet-5 network for discrimination, wherein the first-stage and second-stage convolution pooling operations improve the accuracy of discrimination.
Drawings
FIG. 1 is a block diagram of a dual generative antagonistic learning method based on capsule networking and mixed attention;
FIG. 2 is attention auto-encoder E;
FIG. 3 is a potential vector space generator GA;
FIG. 4 is a diagram of a potential vector space discriminator DA;
FIG. 5 is a sample space generator GB;
FIG. 6 is a sample space discriminator DB;
Fig. 7 shows the results of comparison experiments performed by the present invention with other advanced warfare learning networks using MNIST data sets as examples.
Detailed Description
Example 1: the invention is further described with reference to the figures and training on the MNIST data set. A dual generation confrontation learning method based on capsule network and mixed attention comprises a self-encoder module E based on self-attention and capsule network, a vector space self-confrontation module and a sample space self-confrontation module;
the self-encoder module E based on the self-attention and capsule network comprises a self-encoder module input layer, a parallel convolution layer, a self-attention layer, a primary capsule layer and a final capsule layer;
vector space self-confrontation module comprising a vector space generator GASum vector space discriminator DA;
Sample space self-confrontation module comprising a sample space generator GBAnd a sample space discriminator DB。
Fig. 1 is a framework diagram of an operation flow according to an embodiment of the present invention, where the method includes the following steps:
(1) preprocessing a real picture input from an input layer of an encoder module, and randomly sampling and extracting random noise z and a sample label L from the characteristic distribution of the real picture;
(2) the self-encoder module E encodes the real picture to finally obtain a real vector space Z consisting of the final capsule layereAnd input to a vector space discriminator DAGenerator of vector space GAGenerating a virtual vector space Z close to reality according to the random noise Z extracted in the step (1) and the sample label LaAnd input to a vector space discriminator DAAnd a sample space generator GB;
(3) Vector space discriminator DAJudging whether the vector space Z input to the step (2) is the real vector space ZeOr virtual vector space ZaAnd feeds back the judgment result to the vector space discriminator DASum vector space generator GA;
(4) Sample space generator GBAccording to the virtual vector space Z input in the step (2)aAnd (2) generating a virtual image by the sample label L extracted in the step (1), and inputting the virtual image to a sample space discriminator DBAnd, the real picture in step (1) is also input to the sample space discriminator DB;
(5) Sample space discriminator DBJudging whether the image input in the step (4) is a real image or a virtual image, and feeding back the judgment result to a sample space discriminator DBAnd a sample space generator GB。
Further, the self-encoder module E in step (2) is based on self-attention and capsule network, and its architecture is shown in fig. 2. The self-encoder module is reasonably applied to the construction of a real vector space, so that the accuracy and the efficiency of extracting image characteristic information are improved for an image generation task, and the quantity of required training samples is reduced. The specific operation steps comprise:
(1.1) inputting a real picture from an input layer of an encoder module;
(1.2) carrying out parallel convolution operation on the real picture through the parallel convolution layer to obtain a characteristic diagram of the real picture;
(1.3) repeatedly extracting and compressing the information in the characteristic diagram in the step (1.2) through a self-attention layer, and outputting the result as a primary capsule layer;
(1.4) further compressing the primary capsule layer to a final capsule layer through a compression function squash, and enabling a real vector space Z consisting of the final capsule layer to be in a space ZeInput to vector space discriminator DA。
In the step (1.2), the parallel convolution layers of the module respectively adopt convolution kernels of 3 × 3, 5 × 5, 7 × 7 and 9 × 9 to perform parallel convolution operation, and the position information of the real picture on different resolutions can be obtained by performing non-parallel convolution operation on 4 different convolution kernels, so that the model training speed can be increased, and the parameters and the complexity of the network can be reduced. Obtaining 256 4 × 4 feature maps after parallel convolution layers, wherein each 64 feature maps are picture space position information obtained by convolution kernels with the same size, and the operation formula is as follows:
T=ReLU(Convk×k(x))
wherein x is a real picture, ReLU is an activation function, Conv is convolution operation, k represents the size of a convolution kernel, and T represents a feature diagram obtained after parallel convolution operation.
Then, in step (1.3), 64 feature maps obtained through the same convolution kernel convolution operation are a group (4 groups in total), feature information extraction and compression are performed on the feature maps of each group n times through a self-Attention module (Attention), and a result feature matrix obtained each time is accumulated to obtain a primary capsule layer, so that the formula is as follows:
Tn=Attention(Tn-1)
wherein the Attention means the self-Attention module, TnRepresents the result obtained after the n-th extraction and compression, TnThe results obtained after the n-1 th extraction and compression are shown, i is 1, 2, …, n-1, n, and U is the primary capsule layer.
Then, in the step (1.4), the primary capsule layer is subjected to compression operation to obtain a final capsule layer to form an output layer ZeAnd high-level characteristics of the corresponding category, such as posture, direction, thickness and the like, are stored in each capsule in the final capsule layer. The compression operation formula is:
in the overall model, the task of the self-encoder module is to map the samples of each class into a potential vector space. The potential vector space can express the characteristics of each class sample to the maximum extent. To achieve this, the self-encoder model uses a method similar to training a classifier, i.e., minimizing the edge loss function LaTo train:
La=Tamax(0,m+-||va||)2+λ(1-Ta)max(0,||va||-m-)
Taindicating whether the predicted class is the current class, m+For the upper bound of the loss, 0.9 m is taken-To lower bound of loss, take 0.1, vaFor the representative vector of the a-th category, the length is taken to represent the existing probability, and finally, the ratio between the two is balanced by a hyperparameter lambda. Loss L from the encoderEFor the average loss for each class, the formula is as follows:
the invention also adds a vector space self-countermeasure module and a sample space self-countermeasure module, and improves the robustness of the whole framework and the definition and the reality of the finally generated virtual image. And, the basic architecture of the sample space generator and the sample space discriminator is improved: the sample space generator introduces a channel attention module and a space attention module in the feature mapping process, and focuses on information of different channels of the convolutional layer and space information of the convolutional layer respectively, so that the loss and the loss of the information in the model training process are reduced, and the stability and the accuracy of feature extraction are improved; the sample space discriminator uses LeNet-5 network for discrimination, wherein the first-stage and second-stage convolution pooling operations aim to improve the accuracy of discrimination.
Further, the vector space generator G in step (2)AThe neural network comprises a vector space generator input layer, three vector space generator hidden layers and a linear output layer, wherein the three vector space generator hidden layers are all full-connection networks activated by a Leaky ReLU activation function, and the number of neurons is 512, 1024 and 512 respectively.
The vector space discriminator D in the step (3)AThe device comprises a vector space discriminator input layer, three vector space discriminator hidden layers and a nonlinear output layer, wherein the three vector space discriminator hidden layers are all a full-connection network activated by a BN ReLU activation function, and neurons are arrangedThe numbers are 512, 1024 and 512 respectively.
Vector space generator G in vector space self-countermeasure moduleASum vector space discriminator DASee fig. 3 and 4 for details of their training loss function LGAAnd LDAThe formulas are respectively as follows:
where x is the real picture, z is random noise, PrIs a distribution of data samples, PzFor the distribution of random noise, E (x) is the result of the real picture passing through the self-encoder module, Ez~PzDenotes z corresponds to PzDistribution of (E), Ex~PrIndicates that x corresponds to PrDistribution of (2).
Further, the sample space generator G in step (4)BThe system comprises a convolution layer with convolution kernel of 1 multiplied by 1, a mixed attention module and an deconvolution layer, wherein the mixed attention module comprises a channel attention module and a space attention module. The specific operation steps are as follows:
(2.1) virtual vector space ZaObtaining a characteristic diagram F through the convolution layer;
(2.2) carrying out tensor multiplication on the result obtained by the characteristic diagram F through the channel attention module and the characteristic diagram F to obtain the characteristic diagram Fc(ii) a Feature map FcThe results obtained by the spatial attention module, together with the feature map FcCarrying out tensor multiplication to obtain Fs;
(2.3) carrying out tensor multiplication on the result obtained after the sample label L is subjected to reshape reconstruction calculation and the feature map F, and then sequentially carrying out tensor multiplication on the result and the feature map FcAnd feature map FsPerforming matrix addition to obtain a feature map F*;
(2.4) feature map F*Obtaining a virtual image by the deconvolution layer, and inputting the virtual image to the sampleThis space discriminator DB。
The operation formula of the step (2.2) is as follows:
wherein,representing tensor multiplication, McIndicating the channel attention Module, MsThe spatial attention module is denoted and σ denotes the sigmoid function. The operation formula of the step (2.3) is
The sample space discriminator D in the step (5)BThe device comprises a first-level convolution layer, a first-level pooling layer, a second-level convolution layer, a second-level pooling layer, a full-connection layer and a sample space discriminator output layer. The specific operation steps are as follows:
(3.1) the real picture or the virtual image sequentially passes through the first-level convolution layer and the first-level pooling layer to obtain a characteristic diagram y*;
(3.2) feature map y*Sequentially passing through a second convolution layer and a second pooling layer, and outputting the obtained result y through an output layer of a sample space discriminatorb。
The operation formulas of the step (3.1) and the step (3.2) are respectively as follows:
y*=MaxPool(Conv(p)))
yb=σ(MaxPool(Conv(y*)))
wherein, MaxPool is the pooling operation, and p is the input real picture or virtual image. The first and second convolution pooling operations improve the accuracy and effectiveness of the discrimination.
Sample space generator G in sample space self-confrontation moduleBAnd a sample space discriminator DBSee fig. 5 and 6 for details of their training loss function LGBAnd LDBThe formulas are respectively as follows:
where z is random noise.
Finally, the final training loss function L of the overall model is:
the invention has wide application fields, for example, when the classification problem on a production line is processed, the number of samples provided by manufacturers is often insufficient, and a new sample image is required to be correspondingly generated according to the existing samples. The method has low requirement on the number of samples, and the added self-attention module reduces the number of capsules in a capsule network, even reduces the number of capsules to 2% of the original number in efficiency-CapsNet, thereby greatly improving the working efficiency of an encoder; the method has more control, pertinence and accuracy on the generation of the image, and can generate the image similar to a sample provided by a manufacturer better and more clearly.
In the experimental process, a system Ubuntu 18.04 is used, a hardware CPU is AMD Ryzen 52600Six-Core Processor 3.85GHz, a programming language is Python 3.6, a video card is Inviet GeForce RTX 2070super, and a deep learning frame is Pyorch 1.2. The MNIST data set is divided into a training set and a testing set, each of which comprises 10 classes of handwritten numbers, the range of the handwritten numbers is 0 to 9, namely sample labels are obtained, and the size of each sample image is 28 pixels by 28 pixels. The results of the comparative experiments conducted by the present invention and other advanced antagonistic learning networks using the MNIST dataset as an example are shown in fig. 7, and the evaluation parameters of the comparative experiments are as follows:
wherein: IS the inclusion Score, used to assess the quality (sharpness) of the generated image, with larger values being better; the FID is Frechet inclusion Distance and is used for evaluating the diversity of generated images, and the smaller the value, the better the value.
In summary, the dual generation antagonistic learning method based on the capsule network and the mixed attention according to the embodiment of the invention is a novel dual antagonistic learning method with self attention, capsule network and mixed attention. Unlike previous methods, the method uses a hybrid attention module to weight the extracted features so that the impact of the non-transferable features can be effectively clarified. The method further extracts complex multimodal structure information from the features extracted from the whole image by considering the transferability of different regions or resolutions so as to realize finer image generation. Various comparison experiments and ablation experiments on the reference data set also show the feasibility and effectiveness of the method.
While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit and scope of the present invention.
Claims (8)
1. A dual generation confrontation learning method based on capsule network and mixed attention is characterized in that: the system comprises a self-encoder module E based on self-attention and capsule network, a vector space self-confrontation module and a sample space self-confrontation module;
the self-encoder module E based on the self-attention and capsule network comprises a self-encoder module input layer, a parallel convolution layer, a self-attention layer, a primary capsule layer and a final capsule layer;
vector space self-confrontation module comprising a vector space generator GASum vector space discriminator DA;
Sample space self-confrontation module comprising a sample space generator GBAnd a sample space discriminator DB;
The method comprises the following specific steps:
(1) preprocessing a real picture input from an input layer of an encoder module, and randomly sampling and extracting random noise z and a sample label L from the characteristic distribution of the real picture;
(2) the self-encoder module E encodes the real picture to finally obtain a real vector space Z consisting of the final capsule layereAnd input to a vector space discriminator DAGenerator of vector space GAGenerating a virtual vector space Z close to reality according to the random noise Z extracted in the step (1) and the sample label LaAnd input to a vector space discriminator DAAnd a sample space generator GB;
(3) Vector space discriminator DAJudging whether the vector space Z input to the step (2) is the real vector space ZeOr virtual vector space ZaAnd feeds back the judgment result to the vector space discriminator DASum vector space generator GA;
(4) Sample space generator GBAccording to the virtual vector space Z input in the step (2)aAnd (2) generating a virtual image by the sample label L extracted in the step (1), and inputting the virtual image to a sample space discriminator DBAnd, the real picture in step (1) is also input to the sample space discriminator DB;
(5) Sample space discriminator DBInput to itself in the judgment step (4)Whether it is a real picture or a virtual image, and feeds back the determination result to the sample space discriminator DBAnd a sample space generator GB。
2. The dual generation antagonistic learning method based on capsule networking and mixed attention of claim 1, characterized in that: the self-encoder module E based on self-attention and capsule network specifically comprises the following operation steps:
(1.1) inputting a real picture from an input layer of an encoder module;
(1.2) carrying out parallel convolution operation on the real picture through the parallel convolution layer to obtain a characteristic diagram of the real picture;
(1.3) repeatedly extracting and compressing the information in the characteristic diagram in the step (1.2) through a self-attention layer, and outputting the result as a primary capsule layer;
(1.4) further compressing the primary capsule layer to a final capsule layer through a compression operation S, and enabling a real vector space Z formed by the final capsule layer to be a real vector space ZeInput to vector space discriminator DA;
In the step (1.2), 3 × 3, 5 × 5, 7 × 7 and 9 × 9 convolution kernels are respectively adopted to perform parallel convolution operation, and the operation formula is as follows:
T=ReLU(Convk×k(x))
wherein x is a real picture, ReLU is an activation function, Conv is convolution operation, k represents the size of a convolution kernel, and T represents a feature diagram obtained by parallel convolution operation in the step (1.2);
in step (1.3), the operation formula for repeatedly extracting and compressing the feature map obtained in step (1.2) from the attention layer is as follows:
Tn=Attention(Tn-1)
wherein the Attention means the self-Attention module, TnRepresents the result obtained after the n-th extraction and compression, TnIndicates the (n-1) th extraction and pressureThe results obtained after shrinkage, i ═ 1, 2, …, n-1, n, U denotes the primary capsule layer;
the compression operation formula in step (1.4) is as follows:
3. the dual generation antagonistic learning method based on capsule networking and mixed attention of claim 1, characterized in that: vector space generator G of vector space self-countermeasure moduleASum vector space discriminator DATraining loss function LGAAnd LDAThe formulas are respectively as follows:
where x is the real picture, z is random noise, PrIs a distribution of data samples, PzFor the distribution of random noise, E (x) is the result of the real picture passing through the self-encoder module, Ez~PzDenotes z corresponds to PzDistribution of (E), Ex~PrIndicates that x corresponds to PrDistribution of (2).
4. The dual generation antagonistic learning method based on capsule networking and mixed attention of claim 1, characterized in that: vector space generator GAThe neural network comprises a vector space generator input layer, three vector space generator hidden layers and a linear output layer, wherein the three vector space generator hidden layers are all full-connection networks activated by a Leaky ReLU activation function, and the number of neurons is 512, 1024 and 512 respectively.
5. The dual generation antagonistic learning method based on capsule networking and mixed attention of claim 1, characterized in that: vector space discriminator DAThe neural network comprises a vector space discriminator input layer, three vector space discriminator hidden layers and a nonlinear output layer, wherein the three vector space discriminator hidden layers are all fully-connected networks activated by a BN ReLU activation function, and the number of neurons is 512, 1024 and 512 respectively.
6. The dual generation antagonistic learning method based on capsule networking and mixed attention of claim 1, characterized in that: sample space generator GB of sample space self-countermeasure module and training loss function L of sample space discriminator DBGBAnd LDBThe formulas are respectively as follows:
where x is the real picture, z is random noise, PrIs a distribution of data samples, PzFor the distribution of random noise, Ez~PzDenotes z corresponds to PzDistribution of (E), Ex~PrIndicates that x corresponds to PrDistribution of (2).
7. The dual generation antagonistic learning method based on capsule networking and mixed attention of claim 1, characterized in that: sample space generator GBThe method comprises a convolution layer with convolution kernel of 1 multiplied by 1, a mixed attention module and an deconvolution layer, wherein the mixed attention module comprises a channel attention module and a space attention module, and the specific operation steps are as follows:
(2.1) virtual vector space ZaObtaining a characteristic diagram F through the convolution layer;
(2.2) feature map F is obtained by the channel attention moduleThe obtained result is subjected to tensor multiplication with the characteristic diagram F to obtain the characteristic diagram Fc(ii) a Feature map FcThe results obtained by the spatial attention module, together with the feature map FcCarrying out tensor multiplication to obtain Fs;
(2.3) carrying out tensor multiplication on the result obtained after the sample label L is subjected to reshape reconstruction calculation and the feature map F, and then sequentially carrying out tensor multiplication on the result and the feature map FcAnd feature map FsPerforming matrix addition to obtain a feature map F*;
(2.4) feature map F*Obtaining a virtual image by the deconvolution layer, and inputting the virtual image to a sample space discriminator DB;
The operation formula of the step (2.2) is as follows:
wherein,representing tensor multiplication, McIndicating the channel attention Module, MsRepresenting a spatial attention module, and sigma representing a sigmoid function;
the operation formula of the step (2.3) is
8. The capsule network and mixed attention based of claim 1The dual generation antagonistic learning method is characterized in that: sample space discriminator DBIncluding one-level convolution layer, one-level pooling layer, second grade convolution layer, second grade pooling layer, full tie layer and sample space discriminator output layer, concrete operation step is:
(3.1) the real picture or the virtual image sequentially passes through the first-level convolution layer and the first-level pooling layer to obtain a characteristic diagram y*;
(3.2) feature map y*Sequentially passing through a second convolution layer and a second pooling layer, and outputting the obtained result y through an output layer of a sample space discriminatorb;
The operation formulas of the step (3.1) and the step (3.2) are respectively as follows:
y*=MaxPool(Conv(p)))
yb=σ(MaxPool(Conv(y*)))
wherein, MaxPool is the pooling operation, and p is the input real picture or virtual image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110690163.7A CN113378949A (en) | 2021-06-22 | 2021-06-22 | Dual-generation confrontation learning method based on capsule network and mixed attention |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110690163.7A CN113378949A (en) | 2021-06-22 | 2021-06-22 | Dual-generation confrontation learning method based on capsule network and mixed attention |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113378949A true CN113378949A (en) | 2021-09-10 |
Family
ID=77578325
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110690163.7A Pending CN113378949A (en) | 2021-06-22 | 2021-06-22 | Dual-generation confrontation learning method based on capsule network and mixed attention |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113378949A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110633655A (en) * | 2019-08-29 | 2019-12-31 | 河南中原大数据研究院有限公司 | Attention-attack face recognition attack algorithm |
CN113780468A (en) * | 2021-09-28 | 2021-12-10 | 中国人民解放军国防科技大学 | Robust model training method based on small number of neuron connections |
CN115937994A (en) * | 2023-01-06 | 2023-04-07 | 南昌大学 | Data detection method based on deep learning detection model |
CN118135402A (en) * | 2024-03-18 | 2024-06-04 | 临沂大学 | SAR target recognition method and related device based on multistage capsule fusion network |
-
2021
- 2021-06-22 CN CN202110690163.7A patent/CN113378949A/en active Pending
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110633655A (en) * | 2019-08-29 | 2019-12-31 | 河南中原大数据研究院有限公司 | Attention-attack face recognition attack algorithm |
CN113780468A (en) * | 2021-09-28 | 2021-12-10 | 中国人民解放军国防科技大学 | Robust model training method based on small number of neuron connections |
CN113780468B (en) * | 2021-09-28 | 2022-08-09 | 中国人民解放军国防科技大学 | Robust image classification model training method based on small number of neuron connections |
CN115937994A (en) * | 2023-01-06 | 2023-04-07 | 南昌大学 | Data detection method based on deep learning detection model |
CN118135402A (en) * | 2024-03-18 | 2024-06-04 | 临沂大学 | SAR target recognition method and related device based on multistage capsule fusion network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Hu et al. | A novel image steganography method via deep convolutional generative adversarial networks | |
CN108537743B (en) | Face image enhancement method based on generation countermeasure network | |
CN113158862B (en) | Multitasking-based lightweight real-time face detection method | |
CN113378949A (en) | Dual-generation confrontation learning method based on capsule network and mixed attention | |
CN113674140B (en) | Physical countermeasure sample generation method and system | |
CN110543846A (en) | Multi-pose face image obverse method based on generation countermeasure network | |
CN110222668A (en) | Based on the multi-pose human facial expression recognition method for generating confrontation network | |
CN110175248B (en) | Face image retrieval method and device based on deep learning and Hash coding | |
CN112036260B (en) | Expression recognition method and system for multi-scale sub-block aggregation in natural environment | |
Wang et al. | Sketchembednet: Learning novel concepts by imitating drawings | |
Singh et al. | Steganalysis of digital images using deep fractal network | |
CN109800768B (en) | Hash feature representation learning method of semi-supervised GAN | |
CN111476133B (en) | Unmanned driving-oriented foreground and background codec network target extraction method | |
CN113642621A (en) | Zero sample image classification method based on generation countermeasure network | |
CN116311483B (en) | Micro-expression recognition method based on local facial area reconstruction and memory contrast learning | |
Hongmeng et al. | A detection method for deepfake hard compressed videos based on super-resolution reconstruction using CNN | |
CN114170659A (en) | Facial emotion recognition method based on attention mechanism | |
CN117351550A (en) | Grid self-attention facial expression recognition method based on supervised contrast learning | |
Sharma et al. | Deepfakes Classification of Faces Using Convolutional Neural Networks. | |
Hoque et al. | Bdsl36: A dataset for bangladeshi sign letters recognition | |
CN114492634A (en) | Fine-grained equipment image classification and identification method and system | |
CN112560668A (en) | Human behavior identification method based on scene prior knowledge | |
CN116503753A (en) | Remote sensing image scene classification method based on multi-mode airspace transformation network | |
CN115294424A (en) | Sample data enhancement method based on generation countermeasure network | |
CN110188706B (en) | Neural network training method and detection method based on character expression in video for generating confrontation network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |