CN114202690A

CN114202690A - Multi-scale network analysis method based on mixed multilayer perceptron

Info

Publication number: CN114202690A
Application number: CN202111498048.6A
Authority: CN
Inventors: 李林辉; 林谋乐; 景维鹏; 陈广胜; 刘鹏; 李子游
Original assignee: Northeast Forestry University
Current assignee: Northeast Forestry University
Priority date: 2021-12-09
Filing date: 2021-12-09
Publication date: 2022-03-18
Anticipated expiration: 2041-12-09
Also published as: CN114202690B

Abstract

The invention discloses a multi-scale network analysis method based on a mixed multilayer perceptron, which comprises the following steps: an MSC block and a UMLP block; the invention realizes that the high-efficiency hyperspectral classification method exceeds the previous method, the size of the model is only 0.185M, the method can be well suitable for the industrial requirement, the method can be well applied to the hyperspectral field, the forest transition can be monitored, and the early warning effect on forest disasters such as fire and the like can be realized in time.

Description

Multi-scale network analysis method based on mixed multilayer perceptron

Technical Field

The invention relates to the field of hyperspectral images, in particular to a multiscale network analysis method based on a mixed multilayer perceptron.

Background

Huertas et al is the first to apply hyperspectral images to city analysis. And extracting the boundaries of the urban buildings by adopting a geometric topological theory. Adams et al analyze spectral bands by mathematical theory and combine other factors to analyze hyperspectral data. Generally, early hyperspectral image studies mainly utilized theoretical knowledge of various disciplines such as geometry to analyze shallow information. The most advanced methods always imply deep learning represented by Convolutional Neural Networks (CNN). In addition, the method achieves huge achievement and outstanding expression in the aspects of semantic segmentation, classification, target detection and the like. A new framework with context CNN is described that explores local context interactions by jointly exploiting neighboring local spatial spectral relationships.

Undeniably, the role of CNN in deep learning is likely to be de facto standard. However, its parameters grow exponentially with increasing convolutional layer and its size increases with increasing computational power, lasting more than 20M to have acceptable expression capability. In addition, due to the persistence of multiplication and addition operations, computational consumption is a bottleneck for industrial applications and cannot meet the real-time requirements of the industry.

Prior art 2

Recently, self-attention layer based vision transducers (ViT) have gained the most advanced performance in computer vision and have attracted the attention of many researchers. However, the self-attention mechanism requires the computation of three matrices. The stack model consumes many computing resources deeply. Therefore, it results in many disadvantages such as a large consumption of calculation and a significant scale of the model. Tolstikhin demonstrated that CNN is not required for deep learning. The current MLP-Mixer framework employs two types of MLPs to mix the feature and spatial information of each location separately. Is a significant research topic. By sacrificing microscopic accuracy, the mold speed is significantly increased and the mold size is also compressed. Although it is simple, it has excellent performance in various fields and disciplines. And many kinds of research are based on the MLP-Mixer in the field of remote sensing to mine more meaningful analysis. Sildir et al propose a mixed integer nonlinear programming method based on a superstructure, which is used for optimal structural design of number selection, pruning and input selection of MLPs neurons and realizes the most advanced performance, and experimental image data sets are performed in two public hyperspectras. In addition, the Graph Convolution Network (GCN) can embed non-Euclidean features from neighboring notes, capturing features with excellent relationships between hyperspectral pixels. Lin uses GCN to convert features from a chaotic state to a highly cohesive state while reducing redundant information of the data. Noise in hyperspectral images is also a challenging problem. UnDIP is an excellent method that uses geometric end members to extract end members and employs deep learning to estimate abundance, thereby solving the noise problem of hyperspectral images. HyMiNoR is also an efficient denoising method using novel sparse noise frames. Furthermore, U-Net is a classical encoder-decoder architecture, where the encoder embeds spatial and semantic information, which the decoder mixes with the position features.

The second prior art has the defects

Although this method works well in hyperspectral classification, its model is computationally very computationally expensive and its run time is always lengthy. Convolution operations are the root of all problems. It brings excellent performance in various fields, but also brings complicated calculation consumption. The MLP-Mixer is an epoch-making study that involves only MLP operations, mixing all features, such as spatial information, by stacking layers. However, because the model has a simple structure, the expression ability of the model is weakened, and mainly because semantic information between adjacent structures is ignored, semantic relation between features cannot be well captured.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a multi-scale network analysis method based on a hybrid multi-layer sensor.

The technical scheme of the invention is as follows: the multi-scale network analysis method based on the mixed multi-layer perceptron comprises the following steps: MSC block and UMLP block.

Preferably, the MSC block comprises the steps of:

the method comprises the following steps: the adaptation data provided by UMLP is realized by converting the channel dimension into 2n 'n is an integer' through a mixed MLP layer;

step two: mixing channel information using convolutional layers with convolutional kernels of size 1 x 1;

step three: each pixel of the mixed channel information output hyperspectral image forms a patch similar to an image;

step four: each row represents a different characteristic representation of each Pixel generated by the convolution, namely Pixel-C;

step five: each column represents a summary of the original pixel channel values, i.e., Gen-C.

Preferably, the UMLP block comprises the steps of:

the method comprises the following steps: stacking layers of MSCs for higher receptive fields, the input (U, j, x) having global feature information in both the Pixel-C dimension and the Gen-C dimension;

step two: the MixerBlock module mixes semantic information of two directions through two MLP layers, and then extracts Pixel-C dimensional characteristics through one MLP (dimension reduction).

The multi-scale network analysis method based on the mixed multilayer perceptron has the following beneficial effects:

1. the invention realizes a high-efficiency hyperspectral classification method, which exceeds the prior method, thereby having certain commercial value.

2. The size of the model of the invention is only 0.185M, which can be well adapted to the industrial requirements.

3. The method can be well applied to the hyperspectral field, such as field transition analysis, city evolution and the like.

4. The invention monitors forest transition and performs timely early warning function on forest disasters such as fire and the like.

Drawings

FIG. 1 is a diagram of a multi-scale U-shaped multi-layered sensor according to the present invention.

FIG. 2 is a data flow diagram of the present invention.

Fig. 3 is a visualization of the results of the present invention.

Detailed Description

The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.

The invention mainly comprises Numpy, Pandas, Tensor and the like through the existing deep learning frame Pythrch and a corresponding programming library. The Pytrch mainly uses a deep learning model, and comprises a linear module, a convolution module, a parameter penalty module and the like.

The specific scheme is realized according to the following principle:

we convert the segmentation into a classification task in the proposed method, performing pixel-level classification instead of patch segmentation. The method includes an MSC (multi-scale channel) block and a UMLP (U-shaped multi-layer perceptron) block. For the MSC block, in order to provide adaptation data for UMLP, it converts the channel dimension to 2n (n is an integer) by the hybrid MLP layer. The channel information is then mixed using convolutional layers with convolutional kernels of size 1 x 1. It outputs a patch that forms an image-like image for each pixel of the hyperspectral image. Each row represents a different characteristic representation of each pixel generated by the convolution, and is referred to below simply as: Pixel-C. Each column represents a summary of the original pixel channel values, abbreviated as: Gen-C.

The MSC mainly comprises two parts: firstly, MSC is used for extracting the characteristics of Pixel-C, similar to pooling operation, and secondly, the MSC is used for mixing Gen-C information. For a hyperspectral image, the size is H × W × C. C is the number of spectral bands and H and W are the height and width of the image, respectively. The random method randomizes all pixels and excludes the background, and employs random sampling of each pixel.

The Pixel-C dimension embeds only one feature of a Pixel. Therefore, we extract more semantic features using convolution operations of 1 × 1 kernel size, with various information embedded to summarize each pixel channel and named Gen-C.

For UMLP blocks, it consists of MixerBlock, U-shaped interpret and skip connect modules, respectively. The MSC layers are stacked to obtain a higher receptive field, and the input (U, j, x) has global feature information in both the Pixel-C dimension and the Gen-C dimension. The MixerBlock module mixes semantic information of two directions through two MLP layers, and then extracts Pixel-C dimensional characteristics through one MLP (dimension reduction), as shown in FIG. 1

Aiming at the defect of huge calculation consumption of the existing model, the invention provides a multi-scale network based on a hybrid multi-layer sensor. The consumption of computing resources of the MLP model is extremely low, only the multiplication and accumulation of corresponding positions are needed, and the problem that parameters are exponentially increased after network layers are overlapped does not exist, so that the defects of the existing model are well overcome.

Aiming at the defect of poor expression capability of an MLP-Mixer model, the invention provides a multi-scale multi-channel U-shaped network. It converts channel dimensions to 2n (n is an integer) to unify channel data of different data sets. The core of the module is 1 × 1 convolution operation, and the representation of various characteristics of pixels is expanded to obtain multidimensional distribution. It extracts multiple fragments from a single pixel to culture expression capacity. Finally, the representation capability of the model is further enhanced by superposing a hierarchical structure similar to a U-Net structure.

The data flow diagram of the present invention, as shown in FIG. 2;

the present invention has performed extensive experiments on a widely adopted public data set. MUMLP is comprehensively superior to the most advanced method, and in a Houston 2018 data set, compared with the average accuracy of CAGU, the average accuracy is improved by 6.61%, the average accuracy is improved by 5.47% in an MLP-Mixer, the average accuracy is improved by 14.17% in an OTVCA, and the average accuracy is improved by 14.64% in the OTVCA.

The results of the patented method are visualized as shown in fig. 3 below.

Claims

1. The multi-scale network analysis method based on the mixed multilayer perceptron is characterized by comprising the following steps: MSC block and UMLP block.

2. The method of multi-scale network analysis based on hybrid multi-layer perceptron according to claim 1, characterized in that said MSC block comprises the following steps:

3. The hybrid multi-layered perceptron-based multi-scale network analysis method according to claim 1, wherein the UMLP block comprises the steps of: