CN113255727A - Multi-sensor remote sensing image fusion classification method capable of layering dense fusion network - Google Patents

Multi-sensor remote sensing image fusion classification method capable of layering dense fusion network Download PDF

Info

Publication number
CN113255727A
CN113255727A CN202110446906.6A CN202110446906A CN113255727A CN 113255727 A CN113255727 A CN 113255727A CN 202110446906 A CN202110446906 A CN 202110446906A CN 113255727 A CN113255727 A CN 113255727A
Authority
CN
China
Prior art keywords
layer
convolution
pixel
conv2
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110446906.6A
Other languages
Chinese (zh)
Inventor
王相海
冯一宁
宋若曦
穆振华
宋传鸣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Liaoning Normal University
Original Assignee
Liaoning Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Liaoning Normal University filed Critical Liaoning Normal University
Priority to CN202110446906.6A priority Critical patent/CN113255727A/en
Publication of CN113255727A publication Critical patent/CN113255727A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/58Extraction of image or video features relating to hyperspectral data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a fusion classification method for a layering intensive fusion multi-sensor remote sensing image, and belongs to the field of remote sensing image processing. Firstly, introducing a network frame with three branches of space-frequency spectrum-elevation, and extracting spatial features and spectral features of a hyperspectral image and spatial elevation features of a LiDAR image respectively; secondly, a modal attention mechanism of the multi-sensor remote sensing image is provided, and the characteristics of different modal data are obtained by utilizing the relevance and the anisotropy among different modal data; and then, fusing the features obtained by the modal attention mechanism and the self-attention mechanism by using a Flatten operation and a concatenate operation of a convolutional neural network, and classifying the features by using a softmax activation function to realize the ground object classification based on the multi-sensor remote sensing image.

Description

Multi-sensor remote sensing image fusion classification method capable of layering dense fusion network
Technical Field
The invention relates to the field of remote sensing image processing, in particular to a multi-sensor remote sensing image fusion classification method of a layering intensive fusion network, which has good fusion quality, high classification precision and strong multi-modal data interaction capacity.
Background
As a contactless remote sensing technology, remote sensing image-to-ground observation technology has been widely applied in the field of ground surface covering ground object classification. Among many types of sensors, hyperspectral images provide a detailed description of features of land features in a unified map, which can better distinguish land features having the same elevation information but different spectral characteristics, such as a road surface and a grass land at the same height. However, the spatial resolution of the hyperspectral image is not high, phenomena such as spectrum aliasing and 'same spectrum of foreign matters' often exist, and the classification accuracy of the ground objects in a complex scene is seriously influenced. LiDAR data, on the other hand, provides height information for features that may better distinguish features having the same spectral characteristics but different elevation information. Because the traditional LiDAR adopts a single-waveband working mode, if the remote sensing scene interpretation is carried out by utilizing the three-dimensional space information of the ground objects acquired by the LiDAR, the classification and the identification can be usually carried out on the large classes of the ground objects, and the fine interpretation cannot be realized. Taking public hyperspectral images and LiDAR-DSM images of a regional area of a Florida campus as an example, although the hyperspectral images can accurately distinguish grasslands from pavements, the pavements cannot be distinguished from roofs of buildings, and the reason is that the two are made of the same material; conversely, although LiDAR accurately distinguishes buildings and pavements of different heights, it may not be possible to effectively distinguish pavements and lawns of the same height. Therefore, the hyperspectral image and the LiDAR data have rich complementary information, and if the information can be fully utilized to cooperatively complete ground object analysis, the advantages of two types of sensors are favorably fused, the effective improvement of the performance of an intelligent processing algorithm is realized, and the ground object information is analyzed more comprehensively. Under such circumstances, a method for fusion classification of hyperspectral imagery and LiDAR data has received much attention.
Mercier et al introduced a Support Vector Machine (SVM) with a nonlinear kernel function into the field of remote sensing image classification in 2003. Extreme Learning Machine (ELM) was also introduced by Li et al in 2015 into classification of remote sensing images and achieved performance effects comparable to SVM. However, these classification methods have low classification accuracy. Rasi et al proposed a hyperspectral image-LiDAR data fusion classification method based on sparsity and low-rank decomposition in 2017, and spatial redundancy of image features is captured by using sparsity characteristics to improve spatial smoothness of fusion features, so that the problem of a hous phenomenon in the fusion process is effectively solved. However, due to the dimensionality disaster phenomenon of the hyperspectral image, the processing precision of the method is obviously insufficient. Xue et al proposed a hyperspectral image-LiDAR data fusion model based on coupled high-order tensor decomposition in 2019, extracted more potential features through the coupled high-order tensor decomposition technology, and overcome the defects of low classification precision, hous effect caused by the fusion process and the like in the above technology to a certain extent.
In recent years, computing power and data acquisition capabilities of computing devices have increased rapidly. The significant increase in computing power can mitigate the inefficiencies of training, and the significant increase in training data can reduce the risk of "overfitting". Therefore, complex models represented by deep learning techniques such as Convolutional Neural Networks (CNN) are increasingly applied to classification of hyperspectral remote sensing images, and results superior to those of conventional machine learning methods are obtained. In 2017, Li et al trained the pixel points of the hyperspectral image by using the CNN frame, and realized the pixel-level classification process of the hyperspectral image. However, in the method, each single pixel point of the hyperspectral image is trained as a whole, and the spectral characteristics specific to the hyperspectral image are ignored, so that the model precision is insufficient. In 2018, Xu et al proposed a dual-branch convolutional neural network hyperspectral image classification framework, and introduced spectral domain branches and spatial domain branches to perform collaborative classification on spectral features and spatial features of hyperspectral images respectively. Due to the fact that the spectral characteristics of the hyperspectral images are considered, the hyperspectral classification accuracy of the classification method is improved, and the classification accuracy of ground feature information with the same elevation is poor. On the basis, Hao et al introduce elevation information provided by LiDAR data into a double-branch convolutional neural network structure in 2018, and provide a hyperspectral image-LiDAR data collaborative fusion classification framework based on a neural network and a composite kernel. The frame utilizes the convolution neural networks of the three branches to respectively extract the spectral feature, the spatial feature and the elevation feature of the ground feature, so that the ground feature classification precision is improved. However, although the multi-branch input network structure can reduce the information loss of different modality data in the fusion process, no connection is still established between the spatial information of two modalities. Therefore, in 2020, Hong et al combined the generative countermeasure network (GAN) with multi-modal deep learning, and proposed a cross-modal network model using GAN network as main frame. The method adopts a single-stage feature fusion mode to fuse shallow features, and considers the spatial information correlation and the interactivity among data features of different modes to a certain extent. However, since this method only processes shallow features of each modality, deep spatial information correlation between multi-source data cannot be fully utilized, and there is room for further improvement in classification accuracy.
In general, the characteristics of different modes are cooperatively analyzed based on the characteristics of the remote sensing image, so that a higher-quality fusion result can be obtained. Unfortunately, in the prior art, most of the prior art adopts a mode of network front-end fusion in single-stage feature fusion to perform fusion, which usually ignores the correlation and interactivity of spatial information between different modality features, and a multi-branch network structure cannot establish a sufficient correlation relationship between the spatial information of two modalities. At present, a fusion classification method which can increase the interaction among the modalities by utilizing common characteristics and special characteristics among different modalities of the multi-sensor remote sensing image so as to obviously improve the fusion quality and classification precision of the multi-sensor remote sensing image does not exist, and the prior technical scheme still has the defects of poor cross-modality interaction capability, poor fusion quality and limited classification precision.
Disclosure of Invention
The invention aims to solve the technical problems in the prior art and provides the multi-sensor remote sensing image fusion classification method of the layering dense fusion network, which has the advantages of good fusion quality, high classification precision and strong multi-mode data interaction capability.
The technical solution of the invention is as follows: a multi-sensor remote sensing image fusion classification method capable of layering dense fusion network is characterized by comprising the following steps:
step 1, establishing and initializing a convolutional neural network N for fusion and classification of multi-sensor remote sensing imagesahdSaid N isahdComprising 2 sub-networks N for feature extractionfeatureSpeAnd N featureSpa1 sub-network N for shallow feature fusion shallowfusion1 sub-network N for deep feature fusiondeepfusionAnd 1 sub-network N for classificationcls
Step 1.1 establishing and initializing a sub-network NfeatureSpa4 groups of convolutional layers, Conv2_0, Conv2_1, Conv2_2 and Conv2_ 3;
the Conv2_0 comprises 1-layer convolution operation, 1-layer BatchNorm normalization operation and 1-layer activation operation, wherein the convolution layer comprises 100 convolution kernels with the size of 3 x 3, each convolution kernel performs convolution operation by taking 1 pixel as a step size, and a nonlinear activation function ReLU is selected as an activation function for operation;
the Conv2_1 comprises 1-layer convolution operation, 1-layer BatchNorm normalization operation and 1-layer activation operation, wherein the convolution layer comprises 100 convolution kernels with the size of 3 x 3, each convolution kernel performs convolution operation by taking 1 pixel as a step size, and a nonlinear activation function ReLU is selected as an activation function for operation;
the Conv2_2 comprises 1-layer convolution operation, 1-layer BatchNorm normalization operation and 1-layer activation operation, wherein the convolution layer comprises 100 convolution kernels with the size of 1 × 1, each convolution kernel performs convolution operation by taking 1 pixel as a step size, and a nonlinear activation function ReLU is selected as an activation function for operation;
the Conv2_3 comprises 1-layer convolution operation, 1-layer BatchNorm normalization operation and 1-layer activation operation, wherein the convolution layer comprises 100 convolution kernels with the size of 1 × 1, each convolution kernel performs convolution operation by taking 1 pixel as a step size, and a nonlinear activation function ReLU is selected as an activation function for operation;
step 1.2. establishing and initializing sub-network N featureSpe2 groups of convolutional layers, Conv1_0 and Conv1_ 1;
the Conv1_0 comprises 1-layer convolution operation, 1-layer BatchNorm normalization operation and 1-layer activation operation, wherein the convolution layer comprises 64 one-dimensional convolution kernels with the size of 11, each convolution kernel performs convolution operation by taking 1 pixel as a step size, and a nonlinear activation function ReLU is selected as an activation function for operation;
the Conv1_1 comprises a 1-layer convolution operation, a 1-layer BatchNorm normalization operation and a 1-layer activation operation, wherein the convolution layer comprises 128 one-dimensional convolution kernels with the size of 3, each convolution kernel performs convolution operation by taking 1 pixel as a step size, and a nonlinear activation function ReLU is selected as an activation function for operation;
step 1.3. establishing and initializing sub-network NshallowfusionComprising 6 sets of parallel convolutional layers, Conv2_ Q1, Conv2_ K1, Conv2_ V1, Conv2_ Q2, Conv2_ K2 and Conv2_ V2, and 2 sets of custom modules, LSAM、LDPAM
The Conv2_ Q1 comprises 1 layer of convolution operations, including 25 convolution kernels of size 1 × 1, each convolution kernel performing convolution operations in steps of 1 pixel;
the Conv2_ K1 includes 1 layer of convolution operations, including 25 convolution kernels of size 1 × 1, each convolution kernel performing convolution operations with 1 pixel step size;
the Conv2_ V1 comprises 1 layer of convolution operations, including 200 convolution kernels of size 1 × 1, each convolution kernel performing convolution operations with a step size of 1 pixel;
the Conv2_ Q2 comprises 1 layer of convolution operations, including 25 convolution kernels of size 1 × 1, each convolution kernel performing convolution operations in steps of 1 pixel;
the Conv2_ K2 includes 1 layer of convolution operations, including 25 convolution kernels of size 1 × 1, each convolution kernel performing convolution operations with 1 pixel step size;
the Conv2_ V2 comprises 1-layer convolution operation, including 200 2D convolution kernels with the size of 1 × 1, and each convolution kernel carries out convolution operation by taking 1 pixel as a step size;
said LSAMThe module maps the input three-dimensional tensor F to using reshape operation
Figure BDA0003037269460000041
Space, get the characteristic
Figure BDA0003037269460000042
Wherein, CspeIndicating the number of input channels, N2=1×1,
Figure BDA0003037269460000043
Is represented by FspeRThe ith channel of (2), and then calculating the spectrum attention moment array according to the definition of the formula (1)
Figure BDA0003037269460000044
Figure BDA0003037269460000045
Wherein the content of the first and second substances,
Figure BDA0003037269460000046
is represented by FspeSThe element in the jth row and ith column,
Figure BDA0003037269460000047
is represented by FspeRThe transpose of the jth lane of (1),
Figure BDA0003037269460000051
is represented by FspeRRepresents the inner product operation, and further defines F according to the formula (2)speRAnd FspeSPerforming matrix multiplication to obtain spectral attention feature FspeA
Figure BDA0003037269460000052
Wherein γ represents a preset coefficient;
said LDPAMThe module comprises the following 7 steps:
(a) three-dimensional tensor F to be input1Feeding into the convolutional layer Conv2_ Q1 to calculate the characteristics
Figure BDA0003037269460000053
Then F is mixed1Feeding into the convolutional layer Conv2_ K1 to calculate the characteristics
Figure BDA0003037269460000054
Then F is put1Sending the convolution layer Conv2_ V1 to calculate the characteristics
Figure BDA0003037269460000055
Wherein, FQ1,i、FK1,iAnd FV1,iRespectively represent FQ1、FK1And FV1The ith element of (1), CspaNumber of channels representing input tensor, HspaAnd WspaRespectively representing the length and width of the input tensor, K1=25,K2=25,K3=200;
(b)LDPAMModule three-dimensional tensor F2Feeding into the convolutional layer Conv2_ Q2 to calculate the characteristics
Figure BDA0003037269460000056
Then F is mixed2Feeding into the convolutional layer Conv2_ K2 to calculate the characteristics
Figure BDA0003037269460000057
Then F is put2Sending the convolution layer Conv2_ V2 to calculate the characteristics
Figure BDA0003037269460000058
Wherein, FQ2,i、FK2,iAnd FV2,iRespectively represent FQ2、FK2And FV2The ith element of (1);
(c) f is processed by reshape operationQ1And FK1Mapping to
Figure BDA0003037269460000059
Space and calculating a space attention moment matrix according to the definition of the formula (3)
Figure BDA00030372694600000510
Figure BDA00030372694600000511
Wherein N is1Represents the total number of features and N1=Hspa×Wspa
Figure BDA00030372694600000512
Is represented by FspaXThe element in the jth row and ith column,
Figure BDA00030372694600000513
is represented by FK1Transpose of jth element of (a);
(d) f is processed by reshape operationV1Mapping to
Figure BDA00030372694600000514
Space, calculating the spatial attention feature F according to the definition of formula (4)spaA
Figure BDA00030372694600000515
Wherein eta isspaIs a pre-set scaling factor that is,
Figure BDA00030372694600000516
is represented by FspaXA vector consisting of the elements of row i;
(e) f is processed by reshape operationQ2And FK2Mapping to
Figure BDA0003037269460000061
Space, rootCalculating the space attention moment array according to the definition of the formula (5)
Figure BDA0003037269460000062
Figure BDA0003037269460000063
Wherein the content of the first and second substances,
Figure BDA0003037269460000064
is represented by FmXThe element in the jth row and ith column,
Figure BDA0003037269460000065
is represented by FK2Transpose of jth element of (a);
(f) calculating the modal attention feature F according to the definition of formula (6)mA
Figure BDA0003037269460000066
Wherein epsilon2Which represents a preset scaling factor, is set,
Figure BDA0003037269460000067
is represented by FmXA vector consisting of the elements of row i;
(g) calculating the spatial weighting feature F according to the definition of equation (7)maF
FmaF=α1F12FspaA3FmA (7)
Wherein alpha is1,α2And alpha3Representing a preset weight coefficient;
step 1.4 setting up and initializing sub-network NdeepfusionThe self-defined self-leveling light-emitting diode comprises 2 groups of maximum pooling layers and 1 group of self-defined connecting layers, namely MaxPool1, MaxPool2 and Concatenate;
the MaxPool1 comprises 1-layer pooling operation and 1-layer Flatten operation, wherein the pooling layer carries out maximum pooling operation by using a one-dimensional pooling kernel with the size of 1;
the MaxPool2 comprises 1-layer pooling operation, 2-layer full-connection operation, 2-layer activation operation and 1-layer Flatten operation, wherein the pooling layer performs maximum pooling operation by using a pooling core with the size of 2 x 2, the 2-layer full-connection layer is respectively provided with 1024 and 512 output units, ReLU is selected as an activation function for operation, and Dropout operation with the parameter of 0.4 is performed to obtain 3 three-dimensional tensors
Figure BDA0003037269460000068
And
Figure BDA0003037269460000069
the Concatenate will be defined according to equation (8)
Figure BDA00030372694600000610
And
Figure BDA00030372694600000611
performing fusion operation and Dropout operation with 3 times of parameters of 0.5;
Figure BDA00030372694600000612
where ω and b represent weights and offsets of fully connected layers and "|" represents an operation of connecting spectral features with spatial features;
step 1.5 establishing and initializing sub-network N cls1 group of full connection layers, namely Dense 1;
the Dense1 has num classification units and takes Softmax as an activation function, wherein num represents the total number of the ground feature categories to be classified;
step 2, inputting a training set L of a training set H, LiDAR image of a hyperspectral image, a pixel point coordinate set and a label set which are marked artificially, and performing comparison on NahdTraining is carried out;
step 2.1, according to the pixel point coordinate set marked artificially, in the hyperspectral imageExtracting all pixel point sets X with labels from the training set HH={xH,i1, …, M, and extracting pixel point set X with all labels from training set L of LiDAR imageL={xL,i1, …, M, where xH,iRepresents XHThe ith pixel point, xL,iRepresents XLM represents the total number of pixel points having labels;
step 2.2. according to the definition of the formula (9) and the formula (10), X is definedHAnd XLPerforming standardization treatment to obtain
Figure BDA0003037269460000071
And
Figure BDA0003037269460000072
wherein the content of the first and second substances,
Figure BDA0003037269460000073
representing a normalized set of labeled hyperspectral image primitive points,
Figure BDA0003037269460000074
to represent
Figure BDA0003037269460000075
The point of the ith pixel of (a),
Figure BDA0003037269460000076
represents a normalized set of labeled LiDAR pixel points,
Figure BDA0003037269460000077
to represent
Figure BDA0003037269460000078
The ith pixel point of (1);
Figure BDA0003037269460000079
Figure BDA00030372694600000710
step 2.3. with
Figure BDA00030372694600000711
Is divided into a series of high spectral pel block sets X of size 11X 11 centered on each pel point of HH1And are combined with
Figure BDA00030372694600000712
Divides L into a series of sets X of LiDAR image metablocks of size 11X 11 centered on each image metablock ofL1
Step 2.4. mixing XH1And XL1Each image element block in the image acquisition system is turned over up and down to obtain a high-spectrum image element block set XH2And LiDAR pixelblock set XL2
Step 2.5 for XH1Adding Gaussian noise with variance of 0.01 to each pixel block to obtain a hyperspectral pixel block set XH3And to XL1Adding Gaussian noise with variance of 0.03 to each pixel block to obtain a LiDAR pixel block set XL3
Step 2.6. mixing XH1Each pixel block in the hyperspectral image block set X rotates by n multiplied by 90 degrees clockwise and randomly by taking the central point as a rotation center to obtain a hyperspectral image block set XH4And X isL1Each pixel block in the LiDAR pixel block set X is obtained by clockwise randomly rotating n multiplied by 90 degrees by taking the central point as the rotating centerL4Wherein n represents a value randomly selected from the set {1,2,3 };
step 2.7. order
Figure BDA00030372694600000713
And
Figure BDA00030372694600000714
will be provided with
Figure BDA0003037269460000081
And
Figure BDA0003037269460000082
as a training set for fusing and classifying neural networks, and integrating samples in the training set into a triad
Figure BDA0003037269460000083
In the form of a network data input, wherein,
Figure BDA0003037269460000084
represents a pixel pair consisting of a hyperspectral image and a LiDAR image in the training set, and
Figure BDA0003037269460000085
and
Figure BDA0003037269460000086
are the same in spatial coordinates of (a) YiTo represent
Figure BDA0003037269460000087
And
Figure BDA0003037269460000088
making the iteration number iter ← 1 for the corresponding real category label, and executing the step 2.8 to the step 2.13;
step 2.8. adopt the subnetwork NfeatureSpeAnd NfeatureSpaExtracting the characteristics of the training set;
step 2.8.1 utilizing subnetwork NfeatureSpeTraining set for hyperspectral images
Figure BDA0003037269460000089
Carrying out feature extraction to obtain shallow spectral feature F of hyperspectral imagespe
Step 2.8.2 utilizing sub-network NfeatureSpaTraining set for hyperspectral images
Figure BDA00030372694600000810
Carrying out feature extraction to obtain shallow space features F of the hyperspectral imagespa
Step 2.8.3 utilizing sub-network NfeatureSaTraining set for LiDAR imagery
Figure BDA00030372694600000811
Performing feature extraction to obtain shallow elevation features F of the LiDAR imageL
Step 2.9. use sub-network NshallowfusionPerforming shallow layer fusion of a characteristic level to obtain shallow layer characteristics;
step 2.9.1 Using LSAMModule pair shallow spectral feature FspeCalculating to obtain the spectral attention characteristic F of the hyperspectral imagespeA
Step 2.9.2 Using LDPAMModule pair shallow space feature FspaAnd shallow space feature FLCalculating to obtain the spatial modal attention feature F of the hyperspectral imagemaHF
Step 2.9.3 Using LDPAMModule pair shallow space feature FLAnd shallow space feature FspaCalculating to obtain the spatial modal attention feature F of the LiDAR imagemaLF
Step 2.10. use sub-network NdeepfusionCarrying out deep fusion of characteristic levels to obtain deep characteristics;
step 2.10.1 spectral attention feature F using max-pooling layer MaxPool1speACalculating to obtain deep spectral characteristics of the hyperspectral image
Figure BDA00030372694600000812
Step 2.10.2 spatial modal attention feature F of hyperspectral image by using maximum pooling layer MaxPool2maHFCalculating to obtain deep space characteristics of hyperspectral image
Figure BDA00030372694600000813
Step 2.10.3 utilizes the spatial modal attention feature F of the max pooling layer Maxpool2 for LiDAR imagerymaLFCalculation was carried out to obtain LiDeep elevation features for DAR images
Figure BDA00030372694600000814
Step 2.10.4 utilizes the custom linker Concatenate to characterize the deep spectra of the hyperspectral image
Figure BDA0003037269460000091
Spatial features of deep layers
Figure BDA0003037269460000092
Deep elevation features of LiDAR images
Figure BDA0003037269460000093
Calculating to obtain deep layer characteristics FM
Step 2.11 Using subnetwork NclsClassifying the deep features, and calculating to obtain a classification prediction result TRpred
Step 2.12, taking the weighted cross entropy as a loss function according to the definitions of the formula (11) and the formula (12);
Figure BDA0003037269460000094
Figure BDA0003037269460000095
wherein, ω isjThe weight of the jth class is represented,
Figure BDA0003037269460000096
probability, n, of a picture element belonging to class j terrainjRepresenting the number of the jth class of ground-truth ground objects in the ground-truth training sample;
step 2.13, if all pixel blocks in the training set are processed, the step 2.14 is carried out, otherwise, a group of unprocessed pixel blocks are taken out from the training set, and the step 2.8 is returned;
step 2.14 let iter ← iter +1, if yesIteration number iter>Total _ iter, then obtaining the trained convolutional neural network NahdAnd (4) turning to the step (3), otherwise, utilizing a reverse error propagation algorithm based on a random gradient descent method and predicting loss Lω-CUpdating NahdStep 2.8, all the pixel blocks in the training set are reprocessed, and the Total _ iter represents the preset iteration times;
step 3, inputting unlabeled hyperspectral images H 'and LiDAR images L', performing data preprocessing on all pixels of H 'and L', and adopting a trained convolutional neural network NahdCompleting pixel classification;
step 3.1, extracting all pixel points in H' to form a set TH={tH,iI1, …, U, extracting all pixel points in L' to form a set TL={t L,i1, …, U }, where t isH,iRepresents THI-th pixel of (1), tL,iRepresents TLU represents the total number of all picture elements;
step 3.2. definition of T according to formula (17) and formula (18)HAnd TLPerforming standardization treatment to obtain
Figure BDA0003037269460000097
And
Figure BDA0003037269460000098
wherein the content of the first and second substances,
Figure BDA0003037269460000099
representing a normalized labeled set of high-spectrum image pixel points,
Figure BDA00030372694600000910
to represent
Figure BDA00030372694600000911
The point of the ith pixel of (a),
Figure BDA00030372694600000912
representing normalized images with labelsThe set of LiDAR pixel points of (1),
Figure BDA00030372694600000913
to represent
Figure BDA00030372694600000914
The ith pixel point of (1);
Figure BDA00030372694600000915
Figure BDA0003037269460000101
step 3.3. with
Figure BDA0003037269460000102
Each pixel point of the hyperspectral imager is taken as a center, H' is divided into a series of hyperspectral pixel block sets with the size of 11 multiplied by 11 to form a hyperspectral image test set
Figure BDA0003037269460000103
Then combine with
Figure BDA0003037269460000104
With each pixel point as the center, the L' is divided into a series of sets of LiDAR pixel blocks with the size of 11 multiplied by 11 to form a LiDAR image test set
Figure BDA0003037269460000105
Step 3.4. use sub-network NfeatureSpeAnd NfeatureSpaExtracting the characteristics of the test set;
step 3.4.1 utilizing sub-network NfeatureSpeTo pair
Figure BDA0003037269460000106
Carrying out feature extraction to obtain spectral feature T of hyperspectral image Hspe
Step 3.4.2 utilizes sub-network NfeatureSpaTo pair
Figure BDA0003037269460000107
Carrying out feature extraction to obtain spatial feature T of hyperspectral image Hspa
Step 3.4.3 utilizing sub-network NfeatureSpaTo pair
Figure BDA0003037269460000108
Extracting features to obtain the elevation features T of the LiDAR image LL
Step 3.5. use sub-network NshallowfusionPerforming shallow layer fusion of a characteristic level to obtain shallow layer characteristics;
step 3.5.1 Using LSAMModule pair spectral feature TspeCalculating to obtain the spectral attention characteristic T of the hyperspectral image HspeA
Step 3.5.2 Using LDPAMModule to space characteristics TspaAnd spatial feature TLCalculating to obtain the spatial modal attention feature T of the hyperspectral image HmaHF
Step 3.5.3 utilizes LDPAMModule to space characteristics TLAnd spatial feature TspaCalculating to obtain the space modal attention feature T of the LiDAR image LmaLF
Step 3.6. use sub-network NdeepfusionCarrying out deep fusion of characteristic levels to obtain deep characteristics;
step 3.6.1 spectral attention feature T using max-pooling layer MaxPool1speACalculating to obtain deep spectral characteristics of the hyperspectral image H
Figure BDA0003037269460000109
Step 3.6.2 attention feature T to spatial modality with max pooling layer Maxpool2maHFCalculating to obtain deep spatial features of the hyperspectral image H
Figure BDA00030372694600001010
Step 3.6.3 utilizes the max pooling layer Maxpool2 for the spatial modal attention feature TmaLFCalculating to obtain the deep elevation features of the LiDAR image L
Figure BDA00030372694600001011
Step 3.6.4 utilizes custom connection layer conditioner pairs
Figure BDA00030372694600001012
Calculating to obtain deep layer characteristics TM
Step 3.7 Using subnetwork NclsFor deep layer characteristic TMClassifying to calculate the classified prediction result TEpred
Compared with the prior art, the invention has the advantages of two aspects: firstly, a multi-mode fusion classification frame of a layering dense fusion network based on a multi-attention machine system is introduced, and the frames of three branches of space-frequency spectrum-elevation of a hyperspectral image and a LiDAR image can be organically combined with the attention machine system, so that the precision of ground feature fusion classification can be improved; secondly, by fusing the shallow spectral features and the shallow spatial features of the hyperspectral image with the shallow spatial features of the LiDAR image, a modal attention mechanism for fusion of the shallow features is designed to discover the relevance and the diversity among the multi-modal data of the same ground object, and interaction and advantage complementation among different modal data are realized. Therefore, the method has the characteristics of good fusion quality, high classification precision and strong multi-mode data interaction capability. Experimental results show that the overall accuracy of the method on the Houston data set and the Telento data set respectively reaches 90.06% and 99.03%, the average accuracy is 92.25% and 98.32%, the Kappa coefficient is 89.24% and 98.70%, and the classification accuracy of the ground objects is effectively improved.
Drawings
FIG. 1 is a comparison graph of fusion classification results of the method of the present invention and a SVM method, an ELM method, a CNN-PPF method, a Two-Branch CNN method, and an EndNet method on a Houston data set.
FIG. 2 is a comparison graph of the fusion classification results of the method of the present invention with a SVM method, ELM method, CNN-PPF method, Two-Branch CNN method, EndNet method on the Trento dataset.
Detailed Description
The invention discloses a multi-sensor remote sensing image fusion classification method of a layering dense fusion network, which is carried out according to the following steps:
step 1, establishing and initializing a convolutional neural network N for fusion and classification of multi-sensor remote sensing imagesahdSaid N isahdComprising 2 sub-networks N for feature extractionfeatureSpeAnd N featureSpa1 sub-network N for shallow feature fusion shallowfusion1 sub-network N for deep feature fusiondeepfusionAnd 1 sub-network N for classificationcls
Step 1.1 establishing and initializing a sub-network NfeatureSpa4 groups of convolutional layers, Conv2_0, Conv2_1, Conv2_2 and Conv2_ 3;
the Conv2_0 comprises 1-layer convolution operation, 1-layer BatchNorm normalization operation and 1-layer activation operation, wherein the convolution layer comprises 100 convolution kernels with the size of 3 x 3, each convolution kernel performs convolution operation by taking 1 pixel as a step size, and a nonlinear activation function ReLU is selected as an activation function for operation;
the Conv2_1 comprises 1-layer convolution operation, 1-layer BatchNorm normalization operation and 1-layer activation operation, wherein the convolution layer comprises 100 convolution kernels with the size of 3 x 3, each convolution kernel performs convolution operation by taking 1 pixel as a step size, and a nonlinear activation function ReLU is selected as an activation function for operation;
the Conv2_2 comprises 1-layer convolution operation, 1-layer BatchNorm normalization operation and 1-layer activation operation, wherein the convolution layer comprises 100 convolution kernels with the size of 1 × 1, each convolution kernel performs convolution operation by taking 1 pixel as a step size, and a nonlinear activation function ReLU is selected as an activation function for operation;
the Conv2_3 comprises 1-layer convolution operation, 1-layer BatchNorm normalization operation and 1-layer activation operation, wherein the convolution layer comprises 100 convolution kernels with the size of 1 × 1, each convolution kernel performs convolution operation by taking 1 pixel as a step size, and a nonlinear activation function ReLU is selected as an activation function for operation;
step 1.2. establishing and initializing sub-network N featureSpe2 groups of convolutional layers, Conv1_0 and Conv1_ 1;
the Conv1_0 comprises 1-layer convolution operation, 1-layer BatchNorm normalization operation and 1-layer activation operation, wherein the convolution layer comprises 64 one-dimensional convolution kernels with the size of 11, each convolution kernel performs convolution operation by taking 1 pixel as a step size, and a nonlinear activation function ReLU is selected as an activation function for operation;
the Conv1_1 comprises a 1-layer convolution operation, a 1-layer BatchNorm normalization operation and a 1-layer activation operation, wherein the convolution layer comprises 128 one-dimensional convolution kernels with the size of 3, each convolution kernel performs convolution operation by taking 1 pixel as a step size, and a nonlinear activation function ReLU is selected as an activation function for operation;
step 1.3. establishing and initializing sub-network NshallowfusionComprising 6 sets of parallel convolutional layers, Conv2_ Q1, Conv2_ K1, Conv2_ V1, Conv2_ Q2, Conv2_ K2 and Conv2_ V2, and 2 sets of custom modules, LSAM、LDPAM
The Conv2_ Q1 comprises 1 layer of convolution operations, including 25 convolution kernels of size 1 × 1, each convolution kernel performing convolution operations in steps of 1 pixel;
the Conv2_ K1 includes 1 layer of convolution operations, including 25 convolution kernels of size 1 × 1, each convolution kernel performing convolution operations with 1 pixel step size;
the Conv2_ V1 comprises 1 layer of convolution operations, including 200 convolution kernels of size 1 × 1, each convolution kernel performing convolution operations with a step size of 1 pixel;
the Conv2_ Q2 comprises 1 layer of convolution operations, including 25 convolution kernels of size 1 × 1, each convolution kernel performing convolution operations in steps of 1 pixel;
the Conv2_ K2 includes 1 layer of convolution operations, including 25 convolution kernels of size 1 × 1, each convolution kernel performing convolution operations with 1 pixel step size;
the Conv2_ V2 comprises 1-layer convolution operation, including 200 2D convolution kernels with the size of 1 × 1, and each convolution kernel carries out convolution operation by taking 1 pixel as a step size;
said LSAMThe module maps the input three-dimensional tensor F to using reshape operation
Figure BDA0003037269460000131
Space, get the characteristic
Figure BDA0003037269460000132
Wherein, CspeIndicating the number of input channels, N2=1×1,
Figure BDA0003037269460000133
Is represented by FspeRThe ith channel of (2), and then calculating the spectrum attention moment array according to the definition of the formula (1)
Figure BDA0003037269460000134
Figure BDA0003037269460000135
Wherein the content of the first and second substances,
Figure BDA0003037269460000136
is represented by FspeSThe element in the jth row and ith column,
Figure BDA0003037269460000137
is represented by FspeRThe transpose of the jth lane of (1),
Figure BDA0003037269460000138
is represented by FspeRRepresents the inner product operation, and further defines F according to the formula (2)speRAnd FspeSPerforming matrix multiplication to obtain spectral attention feature FspeA
Figure BDA0003037269460000139
Where γ represents a preset coefficient, and in this embodiment, γ is made to be 0.4;
said LDPAMThe module comprises the following 7 steps:
(a) three-dimensional tensor F to be input1Feeding into the convolutional layer Conv2_ Q1 to calculate the characteristics
Figure BDA00030372694600001310
Then F is mixed1Feeding into the convolutional layer Conv2_ K1 to calculate the characteristics
Figure BDA00030372694600001311
Then F is put1Sending the convolution layer Conv2_ V1 to calculate the characteristics
Figure BDA00030372694600001312
Wherein, FQ1,i、FK1,iAnd FV1,iRespectively represent FQ1、FK1And FV1The ith element of (1), CspaNumber of channels representing input tensor, HspaAnd WspaRespectively representing the length and width of the input tensor, K1=25,K2=25,K3=200;
(b)LDPAMModule three-dimensional tensor F2Feeding into the convolutional layer Conv2_ Q2 to calculate the characteristics
Figure BDA00030372694600001313
Then F is mixed2Feeding into the convolutional layer Conv2_ K2 to calculate the characteristics
Figure BDA00030372694600001314
Then F is put2Sending the convolution layer Conv2_ V2 to calculate the characteristics
Figure BDA00030372694600001315
Wherein, FQ2,i、FK2,iAnd FV2,iRespectively represent FQ2、FK2And FV2The ith element of (1);
(c) f is processed by reshape operationQ1And FK1Mapping to
Figure BDA0003037269460000141
Space and calculating a space attention moment matrix according to the definition of the formula (3)
Figure BDA0003037269460000142
Figure BDA0003037269460000143
Wherein N is1Represents the total number of features and N1=Hspa×Wspa
Figure BDA0003037269460000144
Is represented by FspaXThe element in the jth row and ith column,
Figure BDA0003037269460000145
is represented by FK1Transpose of jth element of (a);
(d) f is processed by reshape operationV1Mapping to
Figure BDA0003037269460000146
Space, calculating the spatial attention feature F according to the definition of formula (4)spaA
Figure BDA0003037269460000147
Wherein eta isspaIs a pre-set scaling factor that is,
Figure BDA0003037269460000148
is represented by FspaXThe vector formed by the ith row element of (1), let η in this embodimentspa=0.4;
(e) F is processed by reshape operationQ2And FK2Mapping to
Figure BDA0003037269460000149
Space, calculating a space attention moment array according to the definition of formula (5)
Figure BDA00030372694600001410
Figure BDA00030372694600001411
Wherein the content of the first and second substances,
Figure BDA00030372694600001412
is represented by FmXThe element in the jth row and ith column,
Figure BDA00030372694600001413
is represented by FK2Transpose of jth element of (a);
(f) calculating the modal attention feature F according to the definition of formula (6)mA
Figure BDA00030372694600001414
Wherein epsilon2Which represents a preset scaling factor, is set,
Figure BDA00030372694600001415
is represented by FmXThe vector formed by the ith row element of (1), let ε in this embodiment2=0.4;
(g) Calculating the spatial weighting feature F according to the definition of equation (7)maF
FmaF=α1F12FspaA3FmA (7)
Wherein alpha is1,α2And alpha3Represents a predetermined weight coefficient, let α in this embodiment1=0.4,α2=0.3, α3=0.3;
Step 1.4 setting up and initializing sub-network NdeepfusionThe self-defined self-leveling light-emitting diode comprises 2 groups of maximum pooling layers and 1 group of self-defined connecting layers, namely MaxPool1, MaxPool2 and Concatenate;
the MaxPool1 comprises 1-layer pooling operation and 1-layer Flatten operation, wherein the pooling layer carries out maximum pooling operation by using a one-dimensional pooling kernel with the size of 1;
the MaxPool2 comprises 1-layer pooling operation, 2-layer full-connection operation, 2-layer activation operation and 1-layer Flatten operation, wherein the pooling layer performs maximum pooling operation by using a pooling core with the size of 2 x 2, the 2-layer full-connection layer is respectively provided with 1024 and 512 output units, ReLU is selected as an activation function for operation, and Dropout operation with the parameter of 0.4 is performed to obtain 3 three-dimensional tensors
Figure BDA0003037269460000151
And
Figure BDA0003037269460000152
the Concatenate will be defined according to equation (8)
Figure BDA0003037269460000153
And
Figure BDA0003037269460000154
performing fusion operation and Dropout operation with 3 times of parameters of 0.5;
Figure BDA0003037269460000155
where ω and b represent weights and offsets of fully connected layers and "|" represents an operation of connecting spectral features with spatial features;
step 1.5 establishing and initializing sub-network N cls1 group of full connection layers, namely Dense 1;
the Dense1 has num classification units and takes Softmax as an activation function, wherein num represents the total number of the ground feature categories to be classified;
step 2, inputting a training set L of a training set H, LiDAR image of a hyperspectral image, a pixel point coordinate set and a label set which are marked artificially, and performing comparison on NahdTraining is carried out;
step 2.1, extracting all pixel point sets X with labels from a hyperspectral image training set H according to the artificially marked pixel point coordinate setH={xH,i1, …, M, and extracting pixel point set X with all labels from training set L of LiDAR imageL={xL,i1, …, M, where xH,iRepresents XHThe ith pixel point, xL,iRepresents XLM represents the total number of pixel points having labels;
step 2.2. according to the definition of the formula (9) and the formula (10), X is definedHAnd XLPerforming standardization treatment to obtain
Figure BDA0003037269460000156
And
Figure BDA0003037269460000157
wherein the content of the first and second substances,
Figure BDA0003037269460000158
representing a normalized set of labeled hyperspectral image primitive points,
Figure BDA0003037269460000159
to represent
Figure BDA00030372694600001510
The point of the ith pixel of (a),
Figure BDA00030372694600001511
represents a normalized set of labeled LiDAR pixel points,
Figure BDA00030372694600001512
to represent
Figure BDA00030372694600001513
The ith pixel point of (1);
Figure BDA00030372694600001514
Figure BDA00030372694600001515
step 2.3. with
Figure BDA0003037269460000161
Is divided into a series of high spectral pel block sets X of size 11X 11 centered on each pel point of HH1And are combined with
Figure BDA0003037269460000162
Divides L into a series of sets X of LiDAR image metablocks of size 11X 11 centered on each image metablock ofL1
Step 2.4. mixing XH1And XL1Each image element block in the image acquisition system is turned over up and down to obtain a high-spectrum image element block set XH2And LiDAR pixelblock set XL2
Step 2.5 for XH1Adding Gaussian noise with variance of 0.01 to each pixel block to obtain a hyperspectral pixel block set XH3And to XL1Adding Gaussian noise with variance of 0.03 to each pixel block to obtain a LiDAR pixel block set XL3
Step 2.6. mixing XH1Each pixel block in the hyperspectral image block set X rotates by n multiplied by 90 degrees clockwise and randomly by taking the central point as a rotation center to obtain a hyperspectral image block set XH4And X isL1Each pixel block in the LiDAR pixel block set X is obtained by clockwise randomly rotating n multiplied by 90 degrees by taking the central point as the rotating centerL4Wherein n represents a value randomly selected from the set {1,2,3 };
step 2.7. order
Figure BDA0003037269460000163
And
Figure BDA0003037269460000164
will be provided with
Figure BDA0003037269460000165
And
Figure BDA0003037269460000166
as a training set for fusing and classifying neural networks, and integrating samples in the training set into a triad
Figure BDA0003037269460000167
In the form of a network data input, wherein,
Figure BDA0003037269460000168
represents a pixel pair consisting of a hyperspectral image and a LiDAR image in the training set, and
Figure BDA0003037269460000169
and
Figure BDA00030372694600001610
are the same in spatial coordinates of (a) YiTo represent
Figure BDA00030372694600001611
And
Figure BDA00030372694600001612
making the iteration number iter ← 1 for the corresponding real category label, and executing the step 2.8 to the step 2.13;
step 2.8. adopt the subnetwork NfeatureSpeAnd NfeatureSpaExtracting the characteristics of the training set;
step 2.8.1 utilizing subnetwork NfeatureSpeTraining set for hyperspectral images
Figure BDA00030372694600001613
Performing feature extraction to obtain shallow spectrum features of hyperspectral imageSign Fspe
Step 2.8.2 utilizing sub-network NfeatureSpaTraining set for hyperspectral images
Figure BDA00030372694600001614
Carrying out feature extraction to obtain shallow space features F of the hyperspectral imagespa
Step 2.8.3 utilizing sub-network NfeatureSpaTraining set for LiDAR imagery
Figure BDA00030372694600001615
Performing feature extraction to obtain shallow elevation features F of the LiDAR imageL
Step 2.9. use sub-network NshallowfusionPerforming shallow layer fusion of a characteristic level to obtain shallow layer characteristics;
step 2.9.1 Using LSAMModule pair shallow spectral feature FspeCalculating to obtain the spectral attention characteristic F of the hyperspectral imagespeA
Step 2.9.2 Using LDPAMModule pair shallow space feature FspaAnd shallow space feature FLCalculating to obtain the spatial modal attention feature F of the hyperspectral imagemaHF
Step 2.9.3 Using LDPAMModule pair shallow space feature FLAnd shallow space feature FspaCalculating to obtain the spatial modal attention feature F of the LiDAR imagemaLF
Step 2.10. use sub-network NdeepfusionCarrying out deep fusion of characteristic levels to obtain deep characteristics;
step 2.10.1 spectral attention feature F using max-pooling layer MaxPool1speACalculating to obtain deep spectral characteristics of the hyperspectral image
Figure BDA0003037269460000171
Step 2.10.2 spatial modal attention feature F of hyperspectral image by using maximum pooling layer MaxPool2maHFCalculating to obtain deep space characteristics of hyperspectral image
Figure BDA0003037269460000172
Step 2.10.3 utilizes the spatial modal attention feature F of the max pooling layer Maxpool2 for LiDAR imagerymaLFCalculating to obtain deep elevation features of the LiDAR image
Figure BDA0003037269460000173
Step 2.10.4 utilizes the custom linker Concatenate to characterize the deep spectra of the hyperspectral image
Figure BDA0003037269460000174
Spatial features of deep layers
Figure BDA0003037269460000175
Deep elevation features of LiDAR images
Figure BDA0003037269460000176
Calculating to obtain deep layer characteristics FM
Step 2.11 Using subnetwork NclsClassifying the deep features, and calculating to obtain a classification prediction result TRpred
Step 2.12, taking the weighted cross entropy as a loss function according to the definitions of the formula (11) and the formula (12);
Figure BDA0003037269460000177
Figure BDA0003037269460000178
wherein, ω isjThe weight of the jth class is represented,
Figure BDA0003037269460000179
indicating that the picture element belongs to the j-th classProbability of ground object, njRepresenting the number of the jth class of ground-truth ground objects in the ground-truth training sample;
step 2.13, if all pixel blocks in the training set are processed, the step 2.14 is carried out, otherwise, a group of unprocessed pixel blocks are taken out from the training set, and the step 2.8 is returned;
step 2.14, let iter ← iter +1, if iter times iter>Total _ iter, then obtaining the trained convolutional neural network NahdAnd (4) turning to the step (3), otherwise, utilizing a reverse error propagation algorithm based on a random gradient descent method and predicting loss Lω-CUpdating NahdStep 2.8, all the image element blocks in the training set are reprocessed, where Total _ iter represents a preset number of iterations, and in this embodiment, Total _ iter is set to 200;
step 3, inputting unlabeled hyperspectral images H 'and LiDAR images L', performing data preprocessing on all pixels of H 'and L', and adopting a trained convolutional neural network NahdCompleting pixel classification;
step 3.1, extracting all pixel points in H' to form a set TH={tH,iI1, …, U, extracting all pixel points in L' to form a set TL={t L,i1, …, U }, where t isH,iRepresents THI-th pixel of (1), tL,iRepresents TLU represents the total number of all picture elements;
step 3.2. definition of T according to formula (17) and formula (18)HAnd TLPerforming standardization treatment to obtain
Figure BDA0003037269460000181
And
Figure BDA0003037269460000182
wherein the content of the first and second substances,
Figure BDA0003037269460000183
representing a normalized labeled set of high-spectrum image pixel points,
Figure BDA0003037269460000184
to represent
Figure BDA0003037269460000185
The point of the ith pixel of (a),
Figure BDA0003037269460000186
represents a normalized set of labeled LiDAR pixel points,
Figure BDA0003037269460000187
to represent
Figure BDA0003037269460000188
The ith pixel point of (1);
Figure BDA0003037269460000189
Figure BDA00030372694600001810
step 3.3. with
Figure BDA00030372694600001811
Each pixel point of the hyperspectral imager is taken as a center, H' is divided into a series of hyperspectral pixel block sets with the size of 11 multiplied by 11 to form a hyperspectral image test set
Figure BDA00030372694600001812
Then combine with
Figure BDA00030372694600001813
With each pixel point as the center, the L' is divided into a series of sets of LiDAR pixel blocks with the size of 11 multiplied by 11 to form a LiDAR image test set
Figure BDA00030372694600001814
Step 3.4. use sub-network NfeatureSpeAnd NfeatureSpaExtraction ofCharacteristics of the test set;
step 3.4.1 utilizing sub-network NfeatureSpeTo pair
Figure BDA00030372694600001815
Carrying out feature extraction to obtain spectral feature T of hyperspectral image Hspe
Step 3.4.2 utilizes sub-network NfeatureSpaTo pair
Figure BDA00030372694600001816
Carrying out feature extraction to obtain spatial feature T of hyperspectral image Hspa
Step 3.4.3 utilizing sub-network NfeatureSpaTo pair
Figure BDA00030372694600001817
Extracting features to obtain the elevation features T of the LiDAR image LL
Step 3.5. use sub-network NshallowfusionPerforming shallow layer fusion of a characteristic level to obtain shallow layer characteristics;
step 3.5.1 Using LSAMModule pair spectral feature TspeCalculating to obtain the spectral attention characteristic T of the hyperspectral image HspeA
Step 3.5.2 Using LDPAMModule to space characteristics TspaAnd spatial feature TLCalculating to obtain the spatial modal attention feature T of the hyperspectral image HmaHF
Step 3.5.3 utilizes LDPAMModule to space characteristics TLAnd spatial feature TspaCalculating to obtain the space modal attention feature T of the LiDAR image LmaLF
Step 3.6. use sub-network NdeepfusionCarrying out deep fusion of characteristic levels to obtain deep characteristics;
step 3.6.1 spectral attention feature T using max-pooling layer MaxPool1speACalculating to obtain deep spectral characteristics of the hyperspectral image H
Figure BDA0003037269460000191
Step 3.6.2 attention feature T to spatial modality with max pooling layer Maxpool2maHFCalculating to obtain deep spatial features of the hyperspectral image H
Figure BDA0003037269460000192
Step 3.6.3 utilizes the max pooling layer Maxpool2 for the spatial modal attention feature TmaLFCalculating to obtain the deep elevation features of the LiDAR image L
Figure BDA0003037269460000193
Step 3.6.4 utilizes custom connection layer conditioner pairs
Figure BDA0003037269460000194
Calculating to obtain deep layer characteristics TM
Step 3.7 Using subnetwork NclsFor deep layer characteristic TMClassifying to calculate the classified prediction result TEpred
In order to verify the effectiveness of the method, experiments are carried out by taking a Houston data set and a Telento data set which are disclosed as examples, the fusion classification result is evaluated by taking the Overall Accuracy (OA), the Average Accuracy (AA) and the Kappa coefficient as objective indexes, and the evaluation result is compared with an SVM method, an ELM method, a CNN-PPF method, a Two-Branch CNN method and an EndNet method.
The main challenge of the terrain classification task is the problem of misclassification. For remote-sensing image-based terrain classification, the most common classification error is the classification of soil as grassland. As can be seen from Table 1, the SVM method, the ELM method, the CNN-PPF method, the Two-Branch CNN method and the EndNet method do not fully utilize the interactivity among the characteristics acquired by different sensors, so that the classification accuracy is not accurate enough, but the method provided by the invention has certain misjudgment among three grasslands in different states, but almost no wrong classification is formed between the whole grasslands and bare soil. In addition, for the tennis court area and the runway area in the table 1 and the timber area and the vineyard area in the table 2, the invention has no classification error and no misjudgment phenomenon, and the accuracy rate reaches 100 percent. As can be seen from Table 1, for the Houston data set, the OA results obtained by the method are respectively improved by 9.57%, 8.14%, 6.73%, 2.08% and 1.54% compared with the SVM method, the ELM method, the CNN-PPF method, the Two-Branch CNN method and the EndNet method, and are improved by 5.61% on average; compared with an SVM method, an ELM method, a CNN-PPF method, a Two-Branch CNN method and an EndNet method, the Kappa coefficient result of the invention is respectively improved by 10.26%, 8.79%, 7.36%, 2.26% and 1.65%, and is improved by 6.06% on average. As can be seen from Table 2, for the trenltor data set, the OA result obtained by the method is respectively improved by 6.26%, 13.22%, 4.27%, 1.11% and 4.86% compared with the SVM method, the ELM method, the CNN-PPF method, the Two-Branch CNN method and the EndNet method, and is improved by 5.94% on average; the Kappa coefficient results are respectively improved by 2.85%, 17.34%, 5.66%, 1.89% and 6.48% compared with the SVM method, the ELM method, the CNN-PPF method, the Two-Branch CNN method and the EndNet method, and are improved by 6.84% on average.
FIG. 1 is a graph of results of different classification methods on Houston data sets, wherein (a) is an HSI pseudo-color image; (b) a digital surface model based on LiDAR imagery; (c) is a group-truth classification chart; (d) the overall accuracy of the classification result of the SVM method is 80.49%; (e) the classification result of the ELM method has the overall accuracy of 81.92 percent; (f) the overall accuracy of the classification result of the CNN-PPF method is 83.33%; (g) the classification result of the Two-Branch CNN method is obtained, and the overall accuracy rate is 87.98%; (h) the classification result of the EndNet method is 88.52 percent of the overall accuracy; (i) the overall accuracy of the classification results of the present invention was 90.06%.
Fig. 2 is a diagram of classification results of different methods on a trenlto data set, wherein (a) is an HSI pseudo-color image; (b) a digital surface model based on LiDAR imagery; (c) is a group-truth classification chart; (d) the overall accuracy of the classification result of the SVM method is 92.77%; (e) the classification result of the ELM method has the overall accuracy of 85.81 percent; (f) the overall accuracy of the classification result of the CNN-PPF method is 94.76%; (g) the classification result of the Two-Branch CNN method is obtained, and the overall accuracy rate is 97.92%; (h) the method is a classification result of an EndNet method, and the overall accuracy is 94.17%; (i) the overall accuracy of the classification results of the present invention was 99.03%.
As can be seen from fig. 1 and 2, for the areas which are difficult to classify, the present invention can more effectively identify various ground features, and particularly, can relatively accurately judge the business area in the upper right corner of fig. 2. Moreover, because the multi-end input is beneficial to reducing information loss, the invention can obtain a classification result which is smoother and more accurate than five single-input methods, namely an SVM method, an ELM method, a CNN-PPF method, a Two-Branch CNN method and an EndNet method.
It can be known from the comparison results of table 1, table 2, fig. 1 and fig. 2 that the fusion quality and classification accuracy of the multi-sensor remote sensing image are effectively improved by fully utilizing the interactivity among different sensor data.
Table 1 houston data set classification accuracy comparison (%)
Figure BDA0003037269460000211
Table 2 comparison of classification accuracy (%) -for the trenntot data sets
Figure BDA0003037269460000212

Claims (1)

1. A multi-sensor remote sensing image fusion classification method capable of layering dense fusion network is characterized by comprising the following steps:
step 1, establishing and initializing a convolutional neural network N for fusion and classification of multi-sensor remote sensing imagesahdSaid N isahdComprising 2 sub-networks N for feature extractionfeatureSpeAnd NfeatureSpa1 subnet for shallow feature fusionLigand of formula IIshallowfusion1 sub-network N for deep feature fusiondeepfusionAnd 1 sub-network N for classificationcls
Step 1.1 establishing and initializing a sub-network NfeatureSpa4 groups of convolutional layers, Conv2_0, Conv2_1, Conv2_2 and Conv2_ 3;
the Conv2_0 comprises 1-layer convolution operation, 1-layer BatchNorm normalization operation and 1-layer activation operation, wherein the convolution layer comprises 100 convolution kernels with the size of 3 x 3, each convolution kernel performs convolution operation by taking 1 pixel as a step size, and a nonlinear activation function ReLU is selected as an activation function for operation;
the Conv2_1 comprises 1-layer convolution operation, 1-layer BatchNorm normalization operation and 1-layer activation operation, wherein the convolution layer comprises 100 convolution kernels with the size of 3 x 3, each convolution kernel performs convolution operation by taking 1 pixel as a step size, and a nonlinear activation function ReLU is selected as an activation function for operation;
the Conv2_2 comprises 1-layer convolution operation, 1-layer BatchNorm normalization operation and 1-layer activation operation, wherein the convolution layer comprises 100 convolution kernels with the size of 1 × 1, each convolution kernel performs convolution operation by taking 1 pixel as a step size, and a nonlinear activation function ReLU is selected as an activation function for operation;
the Conv2_3 comprises 1-layer convolution operation, 1-layer BatchNorm normalization operation and 1-layer activation operation, wherein the convolution layer comprises 100 convolution kernels with the size of 1 × 1, each convolution kernel performs convolution operation by taking 1 pixel as a step size, and a nonlinear activation function ReLU is selected as an activation function for operation;
step 1.2. establishing and initializing sub-network NfeatureSpe2 groups of convolutional layers, Conv1_0 and Conv1_ 1;
the Conv1_0 comprises 1-layer convolution operation, 1-layer BatchNorm normalization operation and 1-layer activation operation, wherein the convolution layer comprises 64 one-dimensional convolution kernels with the size of 11, each convolution kernel performs convolution operation by taking 1 pixel as a step length, and a nonlinear activation function ReLU is selected as an activation function for operation;
the Conv1_1 comprises 1-layer convolution operation, 1-layer BatchNorm normalization operation and 1-layer activation operation, wherein the convolution layer comprises 128 one-dimensional convolution kernels with the size of 3, each convolution kernel performs convolution operation by taking 1 pixel as a step length, and a nonlinear activation function ReLU is selected as an activation function for operation;
step 1.3. establishing and initializing sub-network NshallowfusionComprising 6 sets of parallel convolutional layers, Conv2_ Q1, Conv2_ K1, Conv2_ V1, Conv2_ Q2, Conv2_ K2 and Conv2_ V2, and 2 sets of custom modules, LSAM、LDPAM
The Conv2_ Q1 comprises 1 layer of convolution operation, and comprises 25 convolution kernels with the size of 1 × 1, and each convolution kernel carries out convolution operation by taking 1 pixel as a step size;
the Conv2_ K1 includes 1 layer of convolution operations, including 25 convolution kernels of size 1 × 1, each convolution kernel performing convolution operations with 1 pixel as a step size;
the Conv2_ V1 comprises 1 layer of convolution operation, and comprises 200 convolution kernels with the size of 1 × 1, and each convolution kernel carries out convolution operation by taking 1 pixel as a step size;
the Conv2_ Q2 comprises 1 layer of convolution operation, and comprises 25 convolution kernels with the size of 1 × 1, and each convolution kernel carries out convolution operation by taking 1 pixel as a step size;
the Conv2_ K2 includes 1 layer of convolution operations, including 25 convolution kernels of size 1 × 1, each convolution kernel performing convolution operations with 1 pixel as a step size;
the Conv2_ V2 comprises 1 layer of convolution operations, including 200 2D convolution kernels of size 1 × 1, each convolution kernel performing convolution operations with 1 pixel step size;
said LSAMThe module maps the input three-dimensional tensor F to using reshape operation
Figure FDA0003037269450000021
Space, get a feature
Figure FDA0003037269450000022
Wherein, CspeIndicating the number of input channels, N2=1×1,
Figure FDA0003037269450000023
Is represented by FspeRAnd then calculating the spectral attention matrix according to the formula (1)
Figure FDA0003037269450000024
Figure FDA0003037269450000025
Wherein the content of the first and second substances,
Figure FDA0003037269450000026
is represented by FspeSThe element in the jth row and ith column,
Figure FDA0003037269450000027
is represented by FspeRThe transpose of the jth lane of (1),
Figure FDA0003037269450000028
is represented by FspeRRepresents the inner product operation, and then F is calculated according to the formula (2)speRAnd FspeSPerforming matrix multiplication to obtain spectral attention feature FspeA
Figure FDA0003037269450000029
Wherein γ represents a preset coefficient;
said LDPAMThe module comprises the following 7 steps:
(a) three-dimensional tensor F to be input1Feeding into the convolutional layer Conv2_ Q1 to calculate the characteristics
Figure FDA0003037269450000031
Then F is mixed1Feeding into the convolutional layer Conv2_ K1 to calculate the characteristics
Figure FDA0003037269450000032
Then F is put1Feeding into the convolutional layer Conv2_ V1 to obtain characteristics by calculation
Figure FDA0003037269450000033
Wherein, FQ1,i、FK1,iAnd FV1,iRespectively represent FQ1、FK1And FV1The ith element of (1), CspaNumber of channels representing input tensor, HspaAnd WspaRespectively representing the length and width of the input tensor, K1=25,K2=25,K3=200;
(b)LDPAMModule three-dimensional tensor F2Feeding into the convolutional layer Conv2_ Q2 to calculate the characteristics
Figure FDA0003037269450000034
Then F is mixed2Feeding into the convolutional layer Conv2_ K2 to calculate the characteristics
Figure FDA0003037269450000035
Then F is put2Feeding into the convolutional layer Conv2_ V2 to obtain characteristics by calculation
Figure FDA0003037269450000036
Wherein, FQ2,i、FK2,iAnd FV2,iRespectively represent FQ2、FK2And FV2The ith element of (1);
(c) f is processed by reshape operationQ1And FK1Mapping to
Figure FDA0003037269450000037
Space and calculating a spatial attention matrix according to formula (3)
Figure FDA0003037269450000038
Figure FDA0003037269450000039
Wherein N is1Represents the total number of features and N1=Hspa×Wspa
Figure FDA00030372694500000310
Is represented by FspaXThe element in the jth row and ith column,
Figure FDA00030372694500000311
is represented by FK1Transpose of jth element of (a);
(d) f is processed by reshape operationV1Mapping to
Figure FDA00030372694500000312
Space, calculating a spatial attention feature F according to formula (4)spaA
Figure FDA00030372694500000313
Wherein eta isspaIs a pre-set scaling factor that is,
Figure FDA00030372694500000314
is represented by FspaXA vector consisting of the elements of row i;
(e) f is processed by reshape operationQ2And FK2Mapping to
Figure FDA00030372694500000315
Space, calculating a space attention moment array according to formula (5)
Figure FDA00030372694500000316
Figure FDA00030372694500000317
Wherein the content of the first and second substances,
Figure FDA00030372694500000318
is represented by FmXThe element in the jth row and ith column,
Figure FDA00030372694500000319
is represented by FK2Transpose of jth element of (a);
(f) calculating modal attention feature F according to equation (6)mA
Figure FDA0003037269450000041
Wherein epsilon2Which represents a preset scaling factor, is set,
Figure FDA0003037269450000042
is represented by FmXA vector consisting of the elements of row i;
(g) calculating the spatial weighting feature F according to equation (7)maF
FmaF=α1F12FspaA3FmA (7)
Wherein alpha is1,α2And alpha3Representing a preset weight coefficient;
step 1.4 setting up and initializing sub-network NdeepfusionThe self-defined self-leveling light-emitting diode comprises 2 groups of maximum pooling layers and 1 group of self-defined connecting layers, namely MaxPool1, MaxPool2 and Concatenate;
the MaxPool1 comprises 1-layer pooling operation and 1-layer Flatten operation, wherein the pooling layer carries out maximum pooling operation by using a one-dimensional pooling kernel with the size of 1;
the MaxPool2 comprises 1-layer pooling operation, 2-layer full-link operation, 2-layer activation operation and 1-layer Flatten operation, wherein the pooling layer performs maximum pooling operation by using a pooling core with the size of 2 x 2, and the 2-layer full-link layer respectively comprises1024 and 512 output units, selecting ReLU as an activation function for operation, and then executing Dropout operation with the parameter of 0.4 to obtain 3 three-dimensional tensors
Figure FDA0003037269450000043
And
Figure FDA0003037269450000044
the Concatenate is a derivative of formula (8)
Figure FDA0003037269450000045
And
Figure FDA0003037269450000046
performing fusion operation and Dropout operation with 3 times of parameters of 0.5;
Figure FDA0003037269450000047
where ω and b represent weights and offsets of fully connected layers and "|" represents an operation of connecting spectral features with spatial features;
step 1.5 establishing and initializing sub-network Ncls1 group of full connection layers, namely Dense 1;
the Dense1 has num classification units and takes Softmax as an activation function, wherein num represents the total number of the ground feature categories to be classified;
step 2, inputting a training set L of a training set H, LiDAR image of a hyperspectral image, a pixel point coordinate set and a label set which are marked artificially, and performing comparison on NahdTraining is carried out;
step 2.1, extracting all pixel point sets X with labels from a training set H of the hyperspectral image according to the pixel point coordinate set marked artificiallyH={xH,i1, …, M, and extracting pixel point set X with all labels from training set L of LiDAR imageL={xL,i1, …, M, where xH,iRepresents XHThe ith pixel point, xL,iRepresents XLM represents the total number of pixel points having labels;
step 2.2, X is corrected according to the formula (9) and the formula (10)HAnd XLPerforming standardization treatment to obtain
Figure FDA0003037269450000051
And
Figure FDA0003037269450000052
wherein the content of the first and second substances,
Figure FDA0003037269450000053
representing a normalized set of labeled hyperspectral image primitive points,
Figure FDA0003037269450000054
to represent
Figure FDA0003037269450000055
The point of the ith pixel of (a),
Figure FDA0003037269450000056
represents a normalized set of labeled LiDAR pixel points,
Figure FDA0003037269450000057
to represent
Figure FDA0003037269450000058
The ith pixel point of (1);
Figure FDA0003037269450000059
Figure FDA00030372694500000510
step 2.3. with
Figure FDA00030372694500000511
Is divided into a series of hyperspectral pel block sets X with the size of 11 multiplied by 11 by taking each pixel point as the centerH1And are combined with
Figure FDA00030372694500000512
Divides L into a series of sets X of 11X 11 LiDAR pixelblocks centered at each pixel point ofL1
Step 2.4. mixing XH1And XL1Each image element block in the high-spectrum image element block set X is turned over up and down to obtain a high-spectrum image element block set XH2And LiDAR pixelblock set XL2
Step 2.5 for XH1Adding Gaussian noise with variance of 0.01 to each pixel block to obtain a hyperspectral pixel block set XH3And to XL1Adding Gaussian noise with variance of 0.03 to each pixel block to obtain a LiDAR pixel block set XL3
Step 2.6. mixing XH1Each pixel block in the hyperspectral image block set X rotates by n multiplied by 90 degrees clockwise and randomly by taking the central point as a rotation center to obtain a hyperspectral image block set XH4And X isL1Each pixel block in the LiDAR pixel block set X is obtained by clockwise randomly rotating n multiplied by 90 degrees by taking the central point as a rotation centerL4Wherein n represents a value randomly selected from the set {1,2,3 };
step 2.7. order
Figure FDA00030372694500000513
And
Figure FDA00030372694500000514
will be provided with
Figure FDA00030372694500000515
And
Figure FDA00030372694500000516
as a training set for fusing and classifying neural networks, and integrating samples in the training set into triples
Figure FDA00030372694500000517
In the form of a network data input, wherein,
Figure FDA00030372694500000518
represents a pixel pair consisting of a hyperspectral image and a LiDAR image in the training set, and
Figure FDA00030372694500000519
and
Figure FDA00030372694500000520
are the same in spatial coordinates of (a) YiTo represent
Figure FDA00030372694500000521
And
Figure FDA00030372694500000522
making iteration number iter ← 1 for the corresponding real category label, and executing the step 2.8 to the step 2.13;
step 2.8. adopt the subnetwork NfeatureSpeAnd NfeatureSpaExtracting the characteristics of the training set;
step 2.8.1 utilizing subnetwork NfeatureSpeTraining set for hyperspectral images
Figure FDA0003037269450000061
Carrying out feature extraction to obtain shallow spectral feature F of hyperspectral imagespe
Step 2.8.2 utilizing sub-network NfeatureSpaTraining set for hyperspectral images
Figure FDA0003037269450000062
Carrying out feature extraction to obtain shallow space features F of the hyperspectral imagespa
Step 2.8.3 utilizing sub-network NfeatureSpaTraining set for LiDAR imagery
Figure FDA0003037269450000063
Performing feature extraction to obtain shallow elevation features F of the LiDAR imageL
Step 2.9. use sub-network NshallowfusionPerforming shallow layer fusion of a characteristic level to obtain shallow layer characteristics;
step 2.9.1 Using LSAMModule pair shallow spectral feature FspeCalculating to obtain the spectral attention characteristic F of the hyperspectral imagespeA
Step 2.9.2 Using LDPAMModule pair shallow space feature FspaAnd shallow space feature FLCalculating to obtain the spatial modal attention feature F of the hyperspectral imagemaHF
Step 2.9.3 Using LDPAMModule pair shallow space feature FLAnd shallow space feature FspaCalculating to obtain the spatial modal attention feature F of the LiDAR imagemaLF
Step 2.10. use sub-network NdeepfusionCarrying out deep fusion of characteristic levels to obtain deep characteristics;
step 2.10.1 spectral attention feature F using max-pooling layer MaxPool1speACalculating to obtain deep spectral characteristics of the hyperspectral image
Figure FDA0003037269450000064
Step 2.10.2 spatial modal attention feature F of hyperspectral image by using maximum pooling layer MaxPool2maHFCalculating to obtain deep space characteristics of hyperspectral image
Figure FDA0003037269450000065
Step 2.10.3 utilizes the maximum pooling layer MaxPoint 2 for spatial modal attention feature F of LiDAR imagerymaLFCalculating to obtain deep elevation features of the LiDAR image
Figure FDA0003037269450000066
Step 2.10.4 utilizes the custom linker Concatenate to characterize the deep spectra of the hyperspectral image
Figure FDA0003037269450000067
Spatial features of deep layers
Figure FDA0003037269450000068
Deep elevation features of LiDAR images
Figure FDA0003037269450000069
Calculating to obtain deep layer characteristics FM
Step 2.11 Using subnetwork NclsClassifying the deep features, and calculating to obtain a classification prediction result TRpred
Step 2.12 using the weighted cross entropy as a loss function according to equation (11) and equation (12);
Figure FDA0003037269450000071
Figure FDA0003037269450000072
wherein, ω isjThe weight of the jth class is represented,
Figure FDA0003037269450000073
representing the probability of the picture element belonging to the j-th class of ground objects, njRepresenting the number of the jth class of ground-truth ground objects in the ground-truth training sample;
step 2.13, if all pixel blocks in the training set are processed, the step 2.14 is carried out, otherwise, a group of unprocessed pixel blocks are taken out from the training set, and the step 2.8 is returned;
step 2.14, let iter ← iter +1, if iter times iter>Total _ iter, then obtaining the trained convolutional neural network NahdAnd (4) turning to the step (3), otherwise, utilizing a reverse error propagation algorithm based on a random gradient descent method and predicting the loss Lω-CUpdating NahdStep 2.8, all the pixel blocks in the training set are reprocessed, and the Total _ iter represents the preset iteration times;
step 3, inputting unlabeled hyperspectral images H 'and LiDAR images L', performing data preprocessing on all pixels of H 'and L', and adopting a trained convolutional neural network NahdCompleting pixel classification;
step 3.1, extracting all pixel points in H' to form a set TH={tH,iI1, …, U, extracting all pixel points in L' to form a set TL={tL,i1, …, U }, where t isH,iRepresents THI-th pixel of (1), tL,iRepresents TLU represents the total number of all picture elements;
step 3.2, according to the formula (17) and the formula (18), T is pairedHAnd TLPerforming standardization treatment to obtain
Figure FDA0003037269450000074
And
Figure FDA0003037269450000075
wherein the content of the first and second substances,
Figure FDA0003037269450000076
representing a normalized set of labeled hyperspectral image primitive points,
Figure FDA0003037269450000077
to represent
Figure FDA0003037269450000078
The point of the ith pixel of (a),
Figure FDA0003037269450000079
represents a normalized set of labeled LiDAR pixel points,
Figure FDA00030372694500000710
to represent
Figure FDA00030372694500000711
The ith pixel point of (1);
Figure FDA00030372694500000712
Figure FDA00030372694500000713
step 3.3. with
Figure FDA00030372694500000714
Each pixel point of the hyperspectral imager is taken as a center, H' is divided into a series of hyperspectral pixel block sets with the size of 11 multiplied by 11 to form a hyperspectral image test set
Figure FDA00030372694500000715
Then combine with
Figure FDA00030372694500000716
With each pixel point as the center, the L' is divided into a series of sets of LiDAR pixel blocks with the size of 11 multiplied by 11 to form a LiDAR image test set
Figure FDA00030372694500000717
Step 3.4. use sub-network NfeatureSpeAnd NfeatureSpaExtracting the characteristics of the test set;
step 3.4.1 utilizing sub-network NfeatureSpeTo pair
Figure FDA0003037269450000081
Carrying out feature extraction to obtain spectral feature T of hyperspectral image Hspe
Step 3.4.2 utilizes sub-network NfeatureSpaTo pair
Figure FDA0003037269450000082
Carrying out feature extraction to obtain spatial feature T of hyperspectral image Hspa
Step 3.4.3 utilizing sub-network NfeatureSpaTo pair
Figure FDA0003037269450000083
Extracting features to obtain the elevation features T of the LiDAR image LL
Step 3.5. use sub-network NshallowfusionPerforming shallow layer fusion of a characteristic level to obtain shallow layer characteristics;
step 3.5.1 Using LSAMModule pair spectral feature TspeCalculating to obtain the spectral attention characteristic T of the hyperspectral image HspeA
Step 3.5.2 Using LDPAMModule to space characteristics TspaAnd spatial feature TLCalculating to obtain the spatial modal attention feature T of the hyperspectral image HmaHF
Step 3.5.3 utilizes LDPAMModule to space characteristics TLAnd spatial feature TspaCalculating to obtain the space modal attention feature T of the LiDAR image LmaLF
Step 3.6. use sub-network NdeepfusionCarrying out deep fusion of characteristic levels to obtain deep characteristics;
step 3.6.1 spectral attention feature T using max-pooling layer MaxPool1speACalculating to obtain deep spectral characteristics of the hyperspectral image H
Figure FDA0003037269450000084
Step 3.6.2 attention feature T to spatial modality with max pooling layer Maxpool2maHFCalculating to obtain deep space characteristics of the hyperspectral image H
Figure FDA0003037269450000085
Step 3.6.3 utilizes the max pooling layer Maxpool2 for the spatial modal attention feature TmaLFCalculating to obtain the deep elevation features of the LiDAR image L
Figure FDA0003037269450000086
Step 3.6.4 utilizes custom connection layer conditioner pairs
Figure FDA0003037269450000087
Calculating to obtain deep layer characteristic TM
Step 3.7 Using subnetwork NclsFor deep layer characteristic TMClassifying to calculate the classified prediction result TEpred
CN202110446906.6A 2021-04-25 2021-04-25 Multi-sensor remote sensing image fusion classification method capable of layering dense fusion network Withdrawn CN113255727A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110446906.6A CN113255727A (en) 2021-04-25 2021-04-25 Multi-sensor remote sensing image fusion classification method capable of layering dense fusion network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110446906.6A CN113255727A (en) 2021-04-25 2021-04-25 Multi-sensor remote sensing image fusion classification method capable of layering dense fusion network

Publications (1)

Publication Number Publication Date
CN113255727A true CN113255727A (en) 2021-08-13

Family

ID=77221568

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110446906.6A Withdrawn CN113255727A (en) 2021-04-25 2021-04-25 Multi-sensor remote sensing image fusion classification method capable of layering dense fusion network

Country Status (1)

Country Link
CN (1) CN113255727A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113887645A (en) * 2021-10-13 2022-01-04 西北工业大学 Remote sensing image fusion classification method based on joint attention twin network
CN113920323A (en) * 2021-11-18 2022-01-11 西安电子科技大学 Different-chaos hyperspectral image classification method based on semantic graph attention network
CN114565858A (en) * 2022-02-25 2022-05-31 辽宁师范大学 Multispectral image change detection method based on geospatial perception low-rank reconstruction network
CN114581838A (en) * 2022-04-26 2022-06-03 阿里巴巴达摩院(杭州)科技有限公司 Image processing method and device and cloud equipment
CN114663779A (en) * 2022-03-25 2022-06-24 辽宁师范大学 Multi-temporal hyperspectral image change detection method based on time-space-spectrum attention mechanism
CN114663777A (en) * 2022-03-07 2022-06-24 辽宁师范大学 Hyperspectral image change detection method based on spatio-temporal joint graph attention mechanism
CN116051896A (en) * 2023-01-28 2023-05-02 西南交通大学 Hyperspectral image classification method of lightweight mixed tensor neural network

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113887645A (en) * 2021-10-13 2022-01-04 西北工业大学 Remote sensing image fusion classification method based on joint attention twin network
CN113887645B (en) * 2021-10-13 2024-02-13 西北工业大学 Remote sensing image fusion classification method based on joint attention twin network
CN113920323A (en) * 2021-11-18 2022-01-11 西安电子科技大学 Different-chaos hyperspectral image classification method based on semantic graph attention network
CN113920323B (en) * 2021-11-18 2023-04-07 西安电子科技大学 Different-chaos hyperspectral image classification method based on semantic graph attention network
CN114565858A (en) * 2022-02-25 2022-05-31 辽宁师范大学 Multispectral image change detection method based on geospatial perception low-rank reconstruction network
CN114565858B (en) * 2022-02-25 2024-04-05 辽宁师范大学 Multispectral image change detection method based on geospatial perception low-rank reconstruction network
CN114663777A (en) * 2022-03-07 2022-06-24 辽宁师范大学 Hyperspectral image change detection method based on spatio-temporal joint graph attention mechanism
CN114663777B (en) * 2022-03-07 2024-04-05 辽宁师范大学 Hyperspectral image change detection method based on space-time joint graph attention mechanism
CN114663779A (en) * 2022-03-25 2022-06-24 辽宁师范大学 Multi-temporal hyperspectral image change detection method based on time-space-spectrum attention mechanism
CN114581838A (en) * 2022-04-26 2022-06-03 阿里巴巴达摩院(杭州)科技有限公司 Image processing method and device and cloud equipment
CN116051896A (en) * 2023-01-28 2023-05-02 西南交通大学 Hyperspectral image classification method of lightweight mixed tensor neural network

Similar Documents

Publication Publication Date Title
CN113255727A (en) Multi-sensor remote sensing image fusion classification method capable of layering dense fusion network
Xie et al. Multiscale densely-connected fusion networks for hyperspectral images classification
CN108573276B (en) Change detection method based on high-resolution remote sensing image
Bhatt et al. Deep learning in hyperspectral unmixing: A review
Windrim et al. Pretraining for hyperspectral convolutional neural network classification
CN107590515B (en) Hyperspectral image classification method of self-encoder based on entropy rate superpixel segmentation
CN113011499A (en) Hyperspectral remote sensing image classification method based on double-attention machine system
Wang et al. A unified multiscale learning framework for hyperspectral image classification
CN106339674B (en) The Hyperspectral Image Classification method that model is cut with figure is kept based on edge
CN107145836B (en) Hyperspectral image classification method based on stacked boundary identification self-encoder
CN113469094A (en) Multi-mode remote sensing data depth fusion-based earth surface coverage classification method
CN103336968A (en) Hyperspectral data dimension reduction method based on tensor distance patch calibration
Cai et al. Residual-capsule networks with threshold convolution for segmentation of wheat plantation rows in UAV images
CN109858557B (en) Novel semi-supervised classification method for hyperspectral image data
CN105184314B (en) Wrapper formula EO-1 hyperion band selection methods based on pixel cluster
CN103208011B (en) Based on average drifting and the hyperspectral image space-spectral domain classification method organizing sparse coding
CN113609889A (en) High-resolution remote sensing image vegetation extraction method based on sensitive feature focusing perception
Huang et al. A multilevel decision fusion approach for urban mapping using very high-resolution multi/hyperspectral imagery
Xu et al. Multi-modal deep learning for weeds detection in wheat field based on RGB-D images
Bao et al. Method for wheat ear counting based on frequency domain decomposition of MSVF-ISCT
CN106407975B (en) Multiple dimensioned layering object detection method based on space-optical spectrum structural constraint
Guo et al. Dual-concentrated network with morphological features for tree species classification using hyperspectral image
Zhang et al. A precise apple leaf diseases detection using BCTNet under unconstrained environments
Tu et al. Fully convolutional network-based nonlocal-dependent learning for hyperspectral image classification
Liu et al. Separable coupled dictionary learning for large-scene precise classification of multispectral images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20210813