CN116468083A

CN116468083A - Transformer-based network generation countermeasure method

Info

Publication number: CN116468083A
Application number: CN202310469413.3A
Authority: CN
Inventors: 郝思媛; 翟世杰; 夏裕凤
Original assignee: Qingdao University of Technology
Current assignee: Qingdao University of Technology
Priority date: 2023-04-27
Filing date: 2023-04-27
Publication date: 2023-07-21

Abstract

The invention discloses a method for generating an countermeasure network (Generative Adversarial Network, GAN) based on a transducer aiming at the field of hyperspectral image classification (Hyperspectral Image Classification, HIC). The method introduces a transducer into the GAN and proposes a generation countermeasure network (Transformer with residual upscale GAN, TRUG) with a residual upgrade module for HIC based on the transducer. The TRUG includes a generator G and a discriminator D. In G, we propose a Residual Upgrade (RU) that can increase the resolution of the generated image. In D we employ a progressively decreasing scale Transformer Block and use a grid self-attention mechanism in the first layer to better extract image features. In addition, GAN is prone to the problem of training instability, and to solve this problem we have improved the normalization algorithm and increased the relative position coding. TRUG is the first transducer-based GAN to be applied to HIC.

Description

Transformer-based network generation countermeasure method

Technical Field

The invention relates to a hyperspectral image classification method, in particular to a method for generating an countermeasure network (Generative Adversarial Network, GAN) based on a transducer, and belongs to the technical field of remote sensing information processing.

Background

With the development of technology, hyperspectral image classification (Hyperspectral Image Classification, HIC) has been widely used in many ways. In recent years, a Deep Learning (DL) model has been applied to the HIC field.

As deep learning progresses and model parameters increase, the problem of overfitting becomes a significant challenge. To alleviate this problem Zhang et al have focused on developing a simple network. They propose a 1D capsule network that is easy to implement and lighter than ordinary 3D convolution. Mou et al, however, believe that one-dimensional convolution may result in loss of pixel information when representing hyperspectral pixels, and therefore they propose a novel recurrent neural network (Recurrent Neural Network, RNN) structure. However, RNNs have a problem of inefficiency in processing sequence information. In processing sequential data, a transducer with an attention mechanism can better solve the problem of inefficiency in processing sequences relative to RNNs. Currently, it is a relatively common way to learn image features by combining a transducer with CNN. However, the amount of parameters of the transducer is large, and the phenomenon of fitting is very easy to occur in training for small samples such as HSI. An important way to alleviate the overfitting is to add training data. Many researchers have alleviated this by adding data. This specifically includes data flipping, cropping, translating, and generating models. The generative model alleviates this problem by generating high quality samples. The GAN is a typical generation model and mainly comprises a generator G and a discriminator D, and can fundamentally solve the problem of few data samples, thereby solving the problem of over-fitting. Therefore, more researchers have designed GAN to alleviate the problem of insufficient samples. Zhu et al used 1D GAN as the spectral classifier and 3D GAN as the spatial classifier. In addition, many researchers have combined GAN with other technologies. However, GAN always has problems of training data imbalance and pattern collapse. To solve the problem of training data imbalance, wang et al adapt D to a single classifier and propose an adaptive DropBlock regularization method to solve the pattern collapse problem.

GAN has the disadvantage of instability, and most researchers have been working on solving this problem, so many people introduce various regularization methods, but rarely change its network structure. For CNNs, the convolution operator has a local acceptance field, so CNNs cannot handle remote dependencies. However, HSI has more spectral sequence information. The method thus uses a transducer as a basic framework, which is more suitable for processing global information and also good at processing sequence information. Currently, in the HIC field, no one has introduced a transducer into GAN. Thus, the method combines the ideas of a transducer and a GAN, and proposes a generation countermeasure network (Transformer with residual upscale GAN, TRUG) with a residual upgrade module.

Disclosure of Invention

The present invention introduces a transducer into the GAN and proposes a generation countermeasure network (Transformer with residual upscale GAN, TRUG) with residual upgrade module for HIC based on the transducer. The TRUG contains a generator G and a discriminator D. In G, we propose a Residual Upgrade (RU) that can increase the resolution of the generated image. In D we use a progressively scaled down transducer block and use a grid self-attention mechanism in the first layer to better extract image features. In addition, GAN is prone to the problem of training instability, and to solve this problem we have improved the normalization algorithm and increased the relative position coding.

The method comprises the following specific steps:

s1, performing dimension reduction on original data through PCA to obtain Xpca, and inputting the Xpca into a discriminator D to learn the characteristics of a real sample of the Xpca;

s2, dividing Xpca into a plurality of patches in a discriminator D, and carrying out ebedding on the patches;

s3, inputting the data after the ebedding into a Block of a transducer, learning the characteristics of the data, then downsampling the obtained characteristics to reduce the size of the obtained characteristics, and repeating the steps for three times to obtain final distinguishing characteristics;

s4, inputting one-dimensional random noise Z epsilon R into generator G ^B*L And class labels C, reconstructing the noise Z into a feature map X epsilon R with resolution (H×W) by a Multi-Layer Perceptron (MLP) ^B*H*W*C And input the obtained characteristic diagram X to TrThe ansformer Block further extracts features;

s5, improving the resolution ratio of the feature map by a Residual upgrade module (RU) through the features obtained in the S4, wherein the Residual upgrade module comprises the following specific steps: a Kronecker product is made between the pre-module feature map X and the post-module feature map Xnew to generate a high resolution Xup, specifically formulated as follows:

X ⊗Xnew =Xup

s6, inputting the characteristic diagram Xup obtained in the S5 into a Swin Transform (ST) to further extract characteristics Xst among different windows, and further improving the resolution of the obtained characteristic diagram Xst through an RU module to obtain characteristics Xstnew;

s7, compressing the channel dimension of the Xstnew to be consistent with the channel dimension of the Xpca to obtain a false sample Fake _data ∈R ^B ^*M*N*C ；

S8: false sample Fake to be generated _data And (3) inputting the identification features obtained in the step (S3) into a discriminator (D) together with the real sample Xpca, classifying and distinguishing true and false to obtain a final classification result, and simultaneously, returning the Loss of the classification result and the true and false to a generator to enable the generator to continuously learn to generate a sample with higher quality.

Compared with the prior art, the technical scheme of the invention has the following technical effects:

(1) The GAN can generate a false image similar to the real data, so that the problem of few training samples can be relieved;

(2) RU in TRUG can improve image resolution, enabling G to generate high quality samples;

(3) The network based on the Swin transducer basic module can obtain different characteristic information through window exchange;

(4) Compared with the traditional GAN, TRUG combines a loss function for identifying the true and false samples and classifying, so that the problem of training mode collapse caused by the traditional GAN can be relieved;

(5) TRUG is the first transducer-based GAN to be applied to HIC. Compared with the common GAN, the method can realize higher HSI classification precision;

(6) The expressive information is enhanced by a mechanism of attention.

Drawings

Fig. 1 is a frame diagram of the TRUG of the present invention.

FIG. 2 is a visualization of different size samples generated by the data sets IP and PU.

FIG. 3 is OA of different samples generated by different data sets.

FIG. 4 is a data set IP, OA of whether a PU uses samples generated by the RU module.

FIG. 5 is a visual comparison of classification maps obtained by different methods for IP datasets; (a) False color image, (b) group trunk, (c) SVM, (D) CNN, (e) 3D CNN, (f) hybrid SN, (g) DPRN, (h) transducer, (i) ViT, (j) TRUG.

FIG. 6 is a visual comparison of classification charts obtained by different methods for UP datasets; (a) False color image group try, (c) SVM, (D) CNN, (e) 3D CNN, (f) hybrid SN, (g) DPRN, (h) transducer, (i) ViT, (j) TRUG.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, shall fall within the scope of the invention.

Fig. 1 is a frame diagram of the TRUG of the present invention.

We selected two published HSI datasets, indian Pins (IP), university of Pavia (UP), respectively, to verify the validity of the proposed method.

All data sets are divided into two parts, namely a training set and a testing set. Since GAN is very sensitive to small samples, we classify each class of samples and choose 10% of each class of samples to train. The experimental results mainly comprise three evaluation criteria, namely Overall Accuracy (OA), average Accuracy (AA) and Kappa coefficient (Kappa). In addition, to avoid biased estimates, 10 independent tests were performed using Pytorch on a computer equipped with an intel cool i5 processor and an RTX3090 GPU.

The specific steps for each test were as follows:

s1, reducing the dimension of the original data through PCA to obtainX _pca And willX _pca Inputting the characteristics of the real sample into a discriminator D to learn;

s2, in discriminator DX _pca Dividing into a plurality of patches, and carrying out ebedding on the patches;

s4, inputting one-dimensional random noise into the generator GZ∈R ^B*L And class labels C, noise is generated by a Multi-Layer Perceptron (MLP)ZReconstructed as a feature map of resolution (H W)X∈R ^B*H*W*C And to get to the characteristic diagramXInput to Transformer Block for further extraction of features;

s5, improving the resolution ratio of the feature map by a Residual upgrade module (RU) through the features obtained in the S4, wherein the Residual upgrade module comprises the following specific steps: feature map before moduleXAnd post-module feature mapX _new A Kronecker product is formed between the two to generate a high resolutionX _up The specific formula is as follows:

X⊗X _new =X _up

s6, the feature map obtained in S5X _up Input into Swin Transducer (ST) to further extract features between different windowsXstAnd the obtained characteristic diagramXstFurther enhancement of resolution to feature by RU moduleX _stnew ；

S7, willX _stnew Is compressed to and with the channel dimension of (2)X _pca Channel dimension agreement to obtain false samplesFake _data ∈R ^B ^*M*N*C ；

S8: false samples to be generatedFake _data And a true sampleX _pca And (3) inputting the identification features obtained in the step (S3) into a discriminator (D) for classification and identification of true and false to obtain a final classification result, and simultaneously, returning the identification of true and false and the Loss of the classification result to a generator to enable the generator to continuously learn and generate a sample with higher quality.

To test the effectiveness of the present invention, ablation and comparative experiments were performed, respectively.

A. Ablation experiment

(1) The amount of sample generated and the size of the visual analysis generated image are important parameters. In experiments we generated images of different sizes from the dataset, 16,32 respectively. Visualization of image features generated by a particular experiment is shown in fig. 2. Feature analysis has been a challenge, particularly in its visual analysis. For quality assessment of the generated samples, it is most intuitive to compare with the real image for visual analysis. Fig. 2 (a) shows, from top to bottom, a comparison of the true and false images generated by the data set PU from early to late training. And (b) IN fig. 2 is a visualization of 32×32 false images generated from the IN dataset, which are displayed sequentially from top to bottom according to the length of training time. As can be seen from the figure, the true and false images in the earlier stage of training are significantly different, and similar parts appear in the middle and later stages. The learning process of the image can still be seen during the training process.

For the size of the generated samples, we performed parametric experiments on the two data sets respectively, as shown in fig. 3, and found that the larger the sample size is, the better the classification effect is, but we cannot continue to increase the size for the experiment due to the limitation of hardware conditions. As can be seen from the figure, the size of the generated samples has a great influence on the experimental results for different data sets. For data set IP, when the sample size was 64, experimental results with OA of 94.56% were obtained, which was 14.49% higher than O A when the sample size was 16. The highest classification accuracy 96.76% was obtained when the sample size was 16. It is explained that different datasets have different classification accuracy for different scales. Thus, in subsequent experiments we used a size of 64 for IP and 16 for PU.

(2) RU analysis we selected TransGAN as the comparative experiment using the traditional method of improving image resolution, upScalating. As can be seen from fig. 4, GAN classification accuracy using RU modules is significantly higher than using conventional methods. Experimental results show that RU modules do have certain effects.

B. Comparative experiments

We provide classification accuracy obtained by different methods on the IP and UP datasets. The comparison method includes SVM, CNN,3D CNN,HybridSN,Deep Pyramidal Residual Networks (DPRN) and the most recently occurring transformers and ViT. In the experiment we used 10% of the training set.

It can be seen from table 1 that treg is superior to all other methods. For data set IP, we propose a method with OA much higher than SVM, CNN,3D CNN and ViT, respectively, by about 4.43% higher than the OA obtained by Transformer, hybridSN and DPRN. OA, AA and Kappa for TRUG were 97.85%, 97.67% and 97.55%, respectively. The TRUG has better classifying effect on the HSI. Furthermore, we can also observe that the proposed method achieves the best performance (i.e. oa=99.83%, aa=99.67%, kappa=99.77%) for UP datasets, which is about 1% higher than other deep learning-based methods, 4% to 5% higher than traditional methods. Classification diagrams for different data sets are shown in fig. 5 and 6. The classification map obtained by the method is clearer than the classification map obtained by other methods.

TABLE 1

The foregoing is merely a specific embodiment of the present application, and is not intended to limit the present application in any way, and any simple modification, equivalent variation or modification made to the above embodiment according to the technical matter of the present application still falls within the scope of the technical solutions of the present application.

Claims

1. A method of generating a countermeasure network (Generative Adversarial Network, GAN) based on a transducer, comprising the steps of:

s1: performing dimension reduction on the original data through PCA to obtain Xpca, and inputting the Xpca into a discriminator D to learn the characteristics of a real sample of the Xpca;

s2: xpca is split into several Patches in discriminator D and is subjected to ebedding;

s3: inputting the data after the ebedding into a Block of a transducer, learning the characteristics of the data, then downsampling the obtained characteristics to reduce the size of the characteristics, and repeating the step S3 for three times to obtain the final distinguishing characteristics;

s4: input one-dimensional random noise Z epsilon R into generator G ^B*L And class labels C, reconstructing the noise Z into a feature map X epsilon R with resolution (H×W) by a Multi-Layer Perceptron (MLP) ^B*H*W*C And inputting the obtained feature map X to Transformer Block for further feature extraction;

s5: the resolution of the feature map is improved by a Residual upgrade module (RU) through the features obtained in S4, and the specific steps of the Residual upgrade module are as follows: a Kronecker product is formed between the characteristic diagram X before the module and the characteristic diagram Xnew after the module, and Xup with high resolution is generated;

s6: inputting the feature graph Xup obtained in the step S5 into SwinTransformer (ST) to further extract features Xst among different windows of the feature graph, and further improving the resolution of the obtained feature graph Xst through an RU module to obtain features Xstnew;

s7: compressing the channel dimension of Xstnew to be consistent with the channel dimension of Xpca to obtain a false sample Fake _data ∈R ^B*M*N*C ；

S8: false sample Fake to be generated _data The final classification result is obtained by inputting the discrimination features obtained in S3 into a discriminator D together with a real sample Xpca, classifying and discriminating true and false in a softmax, and meanwhile, the Loss of the discrimination true and false and classification result is transmitted back to a generator to enable the generator to makeThe continuous learning generates higher quality samples.