WO2024040828A1

WO2024040828A1 - Method and device for fusion and classification of remote sensing hyperspectral image and laser radar image

Info

Publication number: WO2024040828A1
Application number: PCT/CN2022/142160
Authority: WO
Inventors: 于文博; 黄鹤; 沈纲祥
Original assignee: 苏州大学
Priority date: 2022-08-26
Filing date: 2022-12-27
Publication date: 2024-02-29
Also published as: CN115331110A

Abstract

The present invention relates to a method for fusion and classification of a remote sensing hyperspectral image and a laser radar image, comprising: acquiring a hyperspectral image and a laser radar image; performing intrinsic image decomposition on the hyperspectral image to obtain an intrinsic image and an illumination image, and for each hyperspectral intrinsic pixel, each hyperspectral illumination pixel, and each laser radar pixel, selecting neighborhood blocks thereof; training a plurality of deep network branches by using the neighborhood blocks; splicing outputs of the plurality of deep network branches in pairs by using a splicing layer; and performing multi-modal fusion on spliced outputs to obtain a final output category. According to the method for fusion and classification of the remote sensing hyperspectral image and the laser radar image provided by the present invention, important discrimination information in a multi-source remote sensing image can be fully fused, thereby achieving the objective of high-precision classification of target pixels, and missing and loss of important information during fusion are fully avoided, thereby reducing the problems of classification precision reduction and the like caused by information loss.

Description

Remote sensing hyperspectral image and lidar image fusion classification method and device

Technical field

The invention relates to the technical field of remote sensing image processing, and in particular, to a method and device for fusion and classification of remote sensing hyperspectral images and lidar images.

Background technique

In the field of remote sensing, hyperspectral images and lidar images are widely used in various related research. Hyperspectral images have rich spatial information and spectral information. The spatial information is the spatial position information of pixels at each wavelength, and the spectral information is the spectral curve composed of the spectral reflectance of a single pixel at each wavelength. Lidar images record the elevation information of target objects. By fully integrating hyperspectral images and lidar images, complementary information can be achieved, and the complete information of the objects can be learned and modeled. At the same time, by fusion and classification of two types of remote sensing images, the characteristics embedded in the pixels can be fully exploited, thereby improving the recognition accuracy of subsequent classification research. In the early days, such fusion classification methods usually used two independent branches to extract features from two images, and achieved the fusion of multi-source information through simple connections, etc. However, such methods did not consider the correlation between different branches. , it is difficult to achieve the balance of multi-source information. With the improvement of computer computing power and the deepening of deep learning research, some methods have been proposed to fully integrate hyperspectral images and lidar images by training neural networks. These methods improve the information extraction process of different images and enhance their Correlation improves the performance of the algorithm.

At present, the fusion classification methods of hyperspectral images and lidar images in the field of remote sensing can generally be divided into fusion classification methods based on classic machine learning and fusion classification methods based on deep learning. The fusion classification method based on classic machine learning is mainly based on the classic machine learning theory, using the spatial information and spectral information in hyperspectral images and the elevation information in lidar images to construct feature extraction modules and fusion modules to achieve the union of different remote sensing images. Express. The more commonly used machine learning theories include Principal Component Analysis (PCA), Minimum Noise Fraction (MNF), Linear Discriminant Analysis (LDA), etc. Other machine learning methods, such as manifold learning algorithms, structural sparsification algorithms, dictionary set decomposition algorithms, etc. also play an important role. This type of method usually extracts discriminative information from hyperspectral images and lidar images, and ensures the classifiable ability of samples by fusing different information. With the continuous deepening of deep learning theory, some deep network models have also been applied to the research on fusion classification of hyperspectral images and lidar images, such as auto-encoder (AE), variational auto-encoder (Variational Auto-encoder). encoder, VAE), long short-term memory network (Long Short-term Memory, LSTM), etc. This type of method extracts deep discriminant information by using complex network structures to describe the discriminant features contained in the sample from multiple aspects. Therefore, more and more fusion classification methods based on deep learning have been widely proposed. For example, Danfeng Hong et al. proposed a fully connected network based on the encoder and decoder structure in Deep Encoder-Decoder Networks for Classification of Hyperspectral and LiDAR Data published in IEEE Geoscience and Remote Sensing Letters in 2020, which combines hyperspectral The features in the image and lidar image are extracted separately and fused, realizing the reconstruction of feature information and the transmission of deeper embedded space. In addition, they published More Diverse Means Better: Multimodal Deep Learning Meets Remote-Sensing Imagery Classification in IEEE Transactions on Geoscience and Remote Sensing in the same year. They proposed a deep learning framework for multimodal data, by parameterizing during the network training process. Cross-selection is used to perform secondary learning of complementary information between multi-modal images. It can be seen that deep learning has been widely used in research on the fusion classification of hyperspectral images and lidar images in the field of remote sensing and has achieved excellent results.

However, the existing fusion classification methods of hyperspectral images and lidar images in the field of remote sensing have certain shortcomings: ① The existing methods do not take into account the correlation between the illumination information of hyperspectral images and the elevation information of lidar images. Therefore, It is difficult to achieve in-depth fusion of the two, weakening the performance of the classification model; ② Existing methods do not apply the illumination information of hyperspectral images to the construction of fusion classification models, and do not consider decomposing hyperspectral images into intrinsic images and illumination images and give full play to the advantages of both. Some methods try to introduce this intrinsic decomposition theory into the classification model, but they all directly discard the decomposed illumination images and only use the intrinsic images and lidar images for fusion classification, which cannot Give full play to the advantages of multi-modal remote sensing images; ③ Existing methods rarely consider the joint and collaborative capabilities between hyperspectral images and lidar images when extracting discriminative information from hyperspectral images and lidar images, and only use completely separated branches to extract information. Mining and feature extraction are not conducive to fully grasping the complete information of pixels, and it is difficult to give full play to the advantages of multi-modal remote sensing images in pixel classification and recognition; ④ Existing methods often use convolutional neural networks when extracting image spatial information. However, conventional convolutional neural networks do not consider the limitations of multi-modal learning and do not have too much structural design for the fusion of information between different modal images. Therefore, it is not conducive to improving the fusion classification accuracy of hyperspectral images and lidar images.

Contents of the invention

To this end, the technical problem to be solved by the present invention is to overcome the problems existing in the existing technology and propose a method and device for fusion and classification of remote sensing hyperspectral images and lidar images, which can fully integrate important discriminant information in multi-source remote sensing images to achieve The purpose of high-precision classification of target pixels is to fully avoid the loss and loss of important information during the fusion process and reduce problems such as reduced classification accuracy due to lack of information.

In order to solve the above technical problems, the present invention provides a fusion classification method of remote sensing hyperspectral images and lidar images, including:

S1: Acquire hyperspectral images and lidar images. The categories of objects in the two images are label;

S2: Decompose the hyperspectral image into intrinsic image to obtain the intrinsic image and illumination image. For each hyperspectral intrinsic pixel, hyperspectral illumination pixel and lidar pixel, select its surrounding neighbors with size s×s. The domain is used as the neighborhood block of the pixel, where the neighborhood block size of each hyperspectral pixel in the hyperspectral image is s×s×B, and the neighborhood block size of the lidar pixel in the lidar image L is s× s;

S3: Training deep network branches L ₁ , L ₂ , L ₃ , L ₄ , L ₅ and L ₆ using neighborhood blocks, where the inputs to L ₁ and L ₂ are hyperspectral eigenimages of size s × s ×B hyperspectral intrinsic pixels, the inputs of L ₃ and L ₄ are lidar pixels of size s×s×B in the lidar image, and the inputs of L ₅ and L ₆ are the lidar pixels in the hyperspectral illumination image The hyperspectral illumination pixel with size s×s×B has outputs of O ¹ , O ² , O ³ , O ⁴ , O ⁵ and O ⁶ respectively, and the size is s×s×d;

S4: Use the splicing layer to splice the outputs of the deep network branches L ₁ , L ₂ , L ₃ , L ₄ , L ₅ and L ₆ in pairs to obtain O ¹² , O ³⁴ and O ⁵⁶ ;

S5: Input O ³⁴ and O ⁵⁶ to the first multi-modal group convolution layer to obtain the output O ³⁴⁵⁶ ₁ . Input O ³⁴ to the second multi-modal group convolution layer to obtain the output O ³⁴ ₁ . O ⁵⁶ is input to the third multi-modal group convolution layer, and the output O ⁵⁶ ₁ is obtained. O ³⁴ ₁ , O ³⁴⁵⁶ ₁ and O ⁵⁶ ₁ are input to the fourth multi-modal group convolution layer, and the output O ³⁴⁵⁶ is obtained. _2. Input O ³⁴ ₁ to the fifth multi-modal group convolution layer to obtain the output O ³⁴ ₂ . Input O ⁵⁶ ₁ to the sixth multi-modal group convolution layer to obtain the output O ⁵⁶ ₂ . Put O ³⁴ ₂ , O ³⁴⁵⁶ ₂ and O ⁵⁶ ₂ are input to the 7th multi-modal group convolution layer, and the output O ³⁴⁵⁶ ₃ is obtained. O ³⁴ ₂ is input to the 8th multi-modal group convolution layer, and the output O ³⁴ is obtained. _3. Input O ⁵⁶ ₂ to the 9th multi-modal group convolution layer to obtain the output O ⁵⁶ ₃ . Input O ³⁴ ₃ , O ³⁴⁵⁶ ₃ and O ⁵⁶ ₃ to the 10th multi-modal group convolution layer. Get the output O ³⁴⁵⁶ ₄ , input O ¹² and O ³⁴⁵⁶ ₁ to the 11th multimodal group convolution layer, get the output O ¹² ₁ , input O ¹² ₁ and O ³⁴⁵⁶ ₂ to the 12th multimodal group convolution layer Product layer, get the output O ¹² ₂ , input O ¹² ₂ and O ³⁴⁵⁶ ₃ to the 13th multi-modal group convolution layer, get the output O ¹² ₃ , put the O ³⁴⁵⁶ ₄ and O of size s×s×d ¹² ₃ inputs a two-dimensional average pooling layer with size s × s, and obtains O ³⁴⁵⁶ ₅ and O ¹² ₄ with size 1 × d;

S6: Input O ¹² ₄ and O ³⁴⁵⁶ ₅ into the splicing layer to obtain the output O ¹²³⁴⁵⁶ , whose size is 1×2d. Input O ¹²³⁴⁵⁶ into the fully connected layer to obtain the final output category:

In one embodiment of the present invention, in step S1, after selecting the hyperspectral image and the lidar image, normalization preprocessing is performed on the hyperspectral image and the lidar image.

In one embodiment of the present invention, the method of decomposing the hyperspectral image into eigenimages to obtain eigenimages and illumination images in step S2 includes:

S2.1: Calculate the matrix D _i corresponding to each hyperspectral pixel H _i , where 1≤i≤X×Y:

D _i = [H ₁ ,..., Hi _-1 , _Hi+1 ,..., H _X×Y , I _B ]∈R ^{B×(B+X×Y-1)}

where I _B is the identity matrix of size B×B;

S2.2: Calculate the vector α _i corresponding to each hyperspectral pixel H _i based on the matrix D _i :

min||α _i || ₁ stH _i ＝D _i α _i

Among them, the shape of α _i is (B+X×Y-1)×1;

S2.3: Construct a weight matrix W∈R ^{(X×Y)×(X×Y)} , and assign values to the elements W _ij in the i-th row and j-th column in the weight matrix.

Calculate the matrix G=(I _X×Y -W ^T )(I _X×Y -W)+δI X _× _Y based on the weight matrix W, where I The identity matrix of , δ is a constant, and T is the transposed matrix;

S2.4: Transform the hyperspectral image H into a two-dimensional matrix, and perform logarithmic calculation to obtain log(flatten(H)). Calculate the matrix K=(I _B -1 _B 1 _B ^T /B)log(flatten(H) ))(I _X×Y -1 _X×Y 1 _X×Y ^T /(X _× Y)), where I _B and I The identity matrix of Y), 1 _B and 1 _X×Y are all-1 vectors with dimensions B×1 and (X×Y)×1 respectively;

S2.5: Calculate matrix ρ = δKG ^-1 based on matrix G and matrix K. Based on matrix ρ, obtain the intrinsic image RE = e ^ρ and illumination image SH = e ^log(H)-ρ obtained by decomposing the hyperspectral image H. Where e is a natural constant, and the dimensions of both images are X×Y×B.

In one embodiment of the present invention, the deep network branches L ₁ , L ₂ , L ₃ , L ₄ , L ₅ and L ₆ in step S3 each include multiple two-dimensional convolution layers. Each two-dimensional convolution layer The number of convolution kernels in the convolution layer is d, the convolution kernel size is [3, 3], the convolution kernel sliding step size is [1, 1], branches L ₂ and L ₃ share all weights, branch L ₄ and L ₅ share all weight.

In one embodiment of the present invention, the loss function when constructing the deep network branch training in step S3 is:

Among them, label is the input image category,

is the output image category.

In one embodiment of the present invention, in step S4, the splicing layer is used to splice the outputs of the deep network branches L ₁ , L ₂ , L ₃ , L ₄ , L ₅ and L ₆ in pairs to obtain O ¹² , O The formulas of ³⁴ and O ⁵⁶ are O ¹² =Concatenation (O ¹ , O ² ), O ³⁴ =Concatenation (O ³ , O ⁴ ), and O ⁵⁶ =Concatenation (O ⁵ , O ⁶ ).

In addition, the present invention also provides a remote sensing hyperspectral image and lidar image fusion classification device, including:

The data acquisition module is used to acquire hyperspectral images and lidar images. The categories of objects in the two images are label;

Image decomposition module, which is used to decompose hyperspectral images into intrinsic images to obtain intrinsic images and illumination images. For each hyperspectral intrinsic pixel, hyperspectral illumination pixel and lidar pixel, select its surrounding size as The neighborhood of s × s is used as the neighborhood block of the pixel, where the neighborhood block size of each hyperspectral pixel in the hyperspectral image is s × s × B, and the neighborhood of the lidar pixel in the lidar image L is The block size is s×s;

Deep network training module, which is used to train the deep network branches L ₁ , L ₂ , L ₃ , L ₄ , L ₅ and L ₆ using neighborhood blocks, where the inputs of L ₁ and L ₂ are in the hyperspectral intrinsic image The hyperspectral intrinsic pixels of size s×s×B are, the inputs of L ₃ and L ₄ are lidar pixels of size s×s×B in the lidar image, and the inputs of L ₅ and L ₆ are The hyperspectral illumination pixels in the hyperspectral illumination image are of size s×s×B, and their outputs are O ¹ , O ² , O ³ , O ⁴ , O ⁵ and O ⁶ respectively, and the sizes are all s×s×d. ;

The image splicing module uses the splicing layer to splice the outputs of the deep network branches L ₁ , L ₂ , L ₃ , L ₄ , L ₅ and L ₆ in pairs to obtain O ¹² , O ³⁴ and O ⁵⁶ ;

The multi-modal fusion module is used to input O ³⁴ and O ⁵⁶ to the first multi-modal group convolution layer to obtain the output O ³⁴⁵⁶ ₁ , and input O ³⁴ to the second multi-modal group convolution layer, Get the output O ³⁴ ₁ , input O ⁵⁶ to the 3rd multi-modal group convolution layer, get the output O ⁵⁶ ₁ , input O ³⁴ ₁ , O ³⁴⁵⁶ ₁ and O ⁵⁶ ₁ to the 4th multi-modal group convolution layer Accumulative layer, get the output O ³⁴⁵⁶ ₂ , input O ³⁴ ₁ to the fifth multi-modal group convolution layer, get the output O ³⁴ ₂ , input O ⁵⁶ ₁ to the sixth multi-modal group convolution layer, get Output O ⁵⁶ ₂ , input O ³⁴ ₂ , O ³⁴⁵⁶ ₂ and O ⁵⁶ ₂ to the 7th multimodal grouping convolution layer to obtain the output O ³⁴⁵⁶ ₃ , input O ³⁴ ₂ to the 8th multimodal grouping volume Multimodal convolution layer, get the output O ³⁴ ₃ , input O ⁵⁶ ₂ to the 9th multi-modal group convolution layer, get the output O ⁵⁶ ₃ , input O ³⁴ ₃ , O ³⁴⁵⁶ ₃ and O ⁵⁶ ₃ to the 10th multi-modal convolutional layer Modal group convolution layer, get the output O ³⁴⁵⁶ ₄ , input O ¹² and O ³⁴⁵⁶ ₁ to the 11th multi-modal group convolution layer, get the output O ¹² ₁ , input O ¹² ₁ and O ³⁴⁵⁶ ₂ to the 11th multi-modal group convolution layer 12 multi-modal grouped convolution layers are used to obtain the output O ¹² ₂ . Input O ¹² ₂ and O ³⁴⁵⁶ ₃ to the 13th multi-modal grouped convolution layer to obtain the output O ¹² ₃ . The size is s×s× O ³⁴⁵⁶ ₄ and O ¹² ₃ of d are input into the two-dimensional average pooling layer with size s×s, and O ³⁴⁵⁶ ₅ and O ¹² ₄ with size 1×d are obtained;

Image classification module, input O ¹² ₄ and O ³⁴⁵⁶ ₅ into the splicing layer to get the output O ¹²³⁴⁵⁶ , whose size is 1×2d, input O ¹²³⁴⁵⁶ into the fully connected layer, and get the final output category:

In one embodiment of the present invention, the data acquisition module includes a data preprocessing submodule, which is used to classify the hyperspectral image and the lidar image after selecting the hyperspectral image and the lidar image. Unified preprocessing.

In one embodiment of the present invention, the image decomposition module performs intrinsic image decomposition on hyperspectral images to obtain intrinsic images and illumination images, including:

Calculate the matrix D _i corresponding to each hyperspectral pixel H _i , where 1≤i≤X×Y:

D _i = [H ₁ ,..., Hi _-1 , _Hi+1 ,..., H _X×Y , I _B ]∈R ^{B×(B+X×Y-1)}

where I _B is the identity matrix of size B×B;

Calculate the vector α _i corresponding to each hyperspectral pixel H _i based on the matrix D _i :

min||α _i || ₁ stH _i ＝D _i α _i

Among them, the shape of α _i is (B+X×Y-1)×1;

Construct a weight matrix W∈R ^{(X×Y)×(X×Y)} , and assign a value to the element W _ij in the i-th row and j-th column in the weight matrix

Transform the hyperspectral image H into a two-dimensional matrix, and perform logarithmic calculation to obtain log(flatten(H)). Calculate the matrix K=(I _B -1 _B 1 _B ^T / ^B )log(flatten(H))(I _X×Y -1 _X×Y 1 _X×Y ^T /(X _× Y)), where I _B and I Matrices, 1 _B and 1 _X×Y are all-1 vectors with dimensions B×1 and (X×Y)×1 respectively;

The matrix ρ = δKG ^-1 is calculated based on the matrix G and the matrix K. Based on the matrix ρ, the intrinsic image RE = e ^ρ and the illumination image SH = e ^log(H)-ρ obtained by decomposing the hyperspectral image H are obtained, where e is the natural Constant, the dimensions of both images are X×Y×B.

In one embodiment of the present invention, the deep network branches L ₁ , L ₂ , L ₃ , L ₄ , L ₅ and L ₆ each include multiple two-dimensional convolution layers, and each two-dimensional convolution layer The number of convolution kernels is d, the convolution kernel size is [3, 3], the convolution kernel sliding step size is [1, 1], branches L ₂ and L ₃ share all weights, branches L ₄ and L ₅ shares all weight.

The above technical solution of the present invention has the following advantages compared with the existing technology:

1. The fusion classification method of remote sensing hyperspectral images and lidar images proposed by this invention can fully integrate important discriminant information in multi-source remote sensing images, achieve the purpose of high-precision classification of target pixels, and fully avoid the loss of important information during the fusion process. and loss, reducing problems such as reduced classification accuracy due to lack of information;

2. The present invention applies the intrinsic image decomposition theory of hyperspectral images to the fusion classification research of hyperspectral images and lidar images, fully improves the intrinsic image decomposition theory and multi-modal remote sensing image fusion classification research, and avoids the need for conventional classification methods. The phenomenon of discarding the decomposed illumination image during intrinsic image decomposition reduces the loss of information;

3. The present invention proposes a method for fully integrating the illumination image obtained by decomposing the hyperspectral image with the lidar image, so that the correlation between the illumination information in the hyperspectral image and the elevation information in the lidar image is fully explored and utilized. , give full play to the advantages of illumination images in model construction research, and improve the final classification performance.

Description of drawings

In order to make the content of the present invention easier to understand clearly, the present invention will be described in further detail below based on specific embodiments of the present invention and in conjunction with the accompanying drawings.

Figure 1 is a flow chart of a fusion classification method of remote sensing hyperspectral images and lidar images provided by the present invention.

Figure 2 is a schematic framework diagram of a remote sensing hyperspectral image and lidar image fusion classification device provided by the present invention.

The reference numbers are as follows: 10. Data acquisition module; 20. Image decomposition module; 30. Deep network training module; 40. Image splicing module; 50. Multi-modal fusion module; 60. Image classification module.

Detailed ways

The present invention will be further described below in conjunction with the accompanying drawings and specific examples, so that those skilled in the art can better understand and implement the present invention, but the examples are not intended to limit the present invention.

Please refer to Figure 1. An embodiment of the present invention provides a method for fusion and classification of remote sensing hyperspectral images and lidar images, including:

Specifically, in step S1, the hyperspectral image H and the lidar image L are selected according to the actual problem, where the hyperspectral image size is X×Y×B, X and Y are the spatial dimensions of the hyperspectral image in each band, and B is the height The number of bands of the spectral image. The lidar image size is X×Y. X and Y are the spatial dimensions of the lidar image. The spatial dimensions of the two images are the same. Perform normalization preprocessing on hyperspectral images and lidar images, set the neighborhood size s (s is an odd number greater than 0), the number of convolution kernels in each two-dimensional convolution layer is d, and the convolution kernel size is [3 , 3], the sliding steps of the convolution kernel are all [1, 1], the padding parameters (Padding) of each two-dimensional convolution layer are all 'Keep the same (Same)', and the activation function selects the Tanh function. The two images The category of the objects in the center is label, the category size is 1×(X×Y), and the number of categories is c.

In step S2, the method of decomposing the hyperspectral image into eigenimages to obtain eigenimages and illumination images includes:

D _i = [H ₁ ,..., Hi _-1 , _Hi+1 ,..., H _X×Y , I _B ]∈R ^{B×(B+X×Y-1)}

where I _B is the identity matrix of size B×B;

min||α _i || ₁ stH _i ＝D _i α _i

Among them, the shape of α _i is (B+X×Y-1)×1;

S2.5: Calculate the matrix ρ = ^δKG-1 based on the matrix G and the matrix K. Based on the matrix ρ, obtain the intrinsic image RE=e ^ρ and the illumination image SH=e ^log(H)-ρ obtained by decomposing the hyperspectral image H, Where e is a natural constant, and the dimensions of both images are X×Y×B.

In step S2, for each hyperspectral eigenpixel, hyperspectral illumination pixel and lidar pixel (there are X×Y pixels in each of the three images), select its surrounding neighbors with size s×s. The domain is used as the neighborhood block of the pixel, where the neighborhood block size of each hyperspectral pixel in the hyperspectral image H is s × s × B, and the neighborhood block size of the lidar pixel in the lidar image L is s ×s.

In step S3, six deep network branches are first constructed, namely L ₁ , L ₂ , L ₃ , L ₄ , L ₅ and L ₆ , where the inputs of L ₁ and L ₂ are hyperspectral intrinsic images RE. The hyperspectral intrinsic pixels of size s×s×B, the inputs of L ₃ and L ₄ are the lidar pixels of size s×s×B in the lidar image L, the inputs of L ₅ and L ₆ is a hyperspectral illumination pixel of size s×s×B in the hyperspectral illumination image SH. The above six deep network branches are composed of three two-dimensional convolution layers, and the convolution in each two-dimensional convolution layer The number of kernels is all d, the sliding steps of the convolution kernels are all [1, 1], branches L ₂ and L ₃ share all weights, branches L ₄ and L ₅ share all weights, and the final six deep network branches The outputs are O ¹ , O ² , O ³ , O ⁴ , O ⁵ and O ⁶ respectively, and the dimensions are all s×s×d.

The loss function when constructing the deep network branch training in step S3 is:

Among them, label is the input image category,

is the output image category.

In step S4, the concatenation layer (Concatenation Layer) is used to concatenate the outputs of the above six deep network branches in pairs, according to the following formula:

O ¹² =Concatenation (O ¹ , O ² )

O ³⁴ =Concatenation (O ³ , O ⁴ )

O ⁵⁶ =Concatenation (O ⁵ , O ⁶ ).

In step S5, Concatenation (O ³⁴ , O ⁵⁶ ) is input to the first multi-modal group convolution layer to obtain the output O ³⁴⁵⁶ ₁ , and O ³⁴ is input to the second multi-modal group convolution layer to obtain Output O ³⁴ ₁ , input O ⁵⁶ to the 3rd multimodal group convolution layer, get the output O ⁵⁶ ₁ , input Concatenation (O ³⁴ ₁ , O ³⁴⁵⁶ ₁ , O ⁵⁶ ₁ ) to the 4th multimodal Group convolution layer, get the output O ³⁴⁵⁶ ₂ , input O ³⁴ ₁ to the 5th multi-modal group convolution layer, get the output O ³⁴ ₂ , input O ⁵⁶ ₁ to the 6th multi-modal group convolution layer , get the output O ⁵⁶ ₂ , input Concatenation (O ³⁴ ₂ , O ³⁴⁵⁶ ₂ , O ⁵⁶ ₂ ) to the 7th multi-modal group convolution layer, get the output O ³⁴⁵⁶ ₃ , input O ³⁴ ₂ to the 8th Multi-modal grouped convolution layer, get the output O ³⁴ ₃ , input O ⁵⁶ ₂ to the 9th multi-modal group convolution layer, get the output O ⁵⁶ ₃ , put Concatenation(O ³⁴ ₃ , O ³⁴⁵⁶ ₃ , O ⁵⁶ ₃ ) Input to the 10th multi-modal group convolution layer to obtain the output O ³⁴⁵⁶ ₄ , and input O ³⁴⁵⁶ ₄ with size s×s×d into the two-dimensional average pooling layer with size s×s (Average Pooling Layer ), get O ³⁴⁵⁶ ₅ with size 1 × d; input Concatenation (O ¹² , O ³⁴⁵⁶ ₁ ) to the 11th multi-modal group convolution layer to get the output O ¹² ₁ , put Concatenation (O ¹² ₁ , O ³⁴⁵⁶ ₂ ) is input to the 12th multi-modal group convolution layer, and the output O ¹² ₂ is obtained. Concatenation (O ¹² ₂ , O ³⁴⁵⁶ ₃ ) is input to the 13th multi-modal group convolution layer, and the output O ¹² is obtained. ₃ , its size is s×s×d, input O ¹² ₃ into the two-dimensional pooling layer (Average Pooling Layer) with size s×s, and obtain O ¹² ₄ with size 1×d.

In the step S6, O ¹² ₄ and O ³⁴⁵⁶ ₅ are input into the concatenation layer to obtain the input O ¹²³⁴⁵⁶ =Concatenation (O ¹² ₄ , O ³⁴⁵⁶ ₅ ), whose size is 1×2d. Input O ¹²³⁴⁵⁶ into the fully connected layer, the number of nodes is c, the activation function is the Softmax function, and the final output category is

The fusion classification method of remote sensing hyperspectral images and lidar images proposed by the present invention efficiently combines the intrinsic image decomposition theory with multi-modal remote sensing image fusion classification research, giving full play to the advantages of illumination images in model construction and reducing the need for important discriminations. Loss and attrition of information.

This invention introduces the intrinsic image decomposition of hyperspectral images into multi-modal remote sensing image fusion classification research for the first time, and achieves the purpose of balancing the elevation information in illumination images and lidar images, and simultaneously fuses this illumination information and elevation information. And guide the intrinsic information mining process to improve the classification ability of samples.

The present invention proposes a method for fully integrating the illumination image obtained by decomposing the hyperspectral image with the lidar image, so that the correlation between the illumination information in the hyperspectral image and the elevation information in the lidar image is fully explored and utilized, and Take advantage of lighting images in model building research.

The present invention proposes a discriminant feature extraction method for hyperspectral images and lidar images, which greatly improves the correlation between the two and reduces the occurrence of information imbalance during the information mining process.

The present invention proposes a method for applying a multi-modal grouping convolution layer to the fusion classification research of hyperspectral images and lidar images, giving full play to the application value of the multi-modal grouping convolution layer in the research field of the present invention and greatly improving the The ability to combine different modalities reduces unnecessary redundant information and enhances the ability to express important information.

The hyperspectral image and lidar image used in the fusion classification method of remote sensing hyperspectral image and lidar image proposed by the present invention were taken in Trento, Italy, where the size of the hyperspectral image is 166×600×63 , the size of the lidar image is 166×600.

(1) Input of this embodiment:

The input hyperspectral image is an image of size 166×600×63, and the input lidar image is an image of size 166×600.

(2) Parameter setting

The neighborhood size is 11, and the number of convolution kernels in each two-dimensional convolution layer is 120.

The hyperspectral image is decomposed to obtain an intrinsic image with a size of 166×600×63 and an illumination image with a size of 166×600×63.

Neighborhood information is selected, and a neighborhood block of size 11×11×63 is obtained for each pixel, and the neighborhood block is input into the deep network for training.

(3) Training this deep network model

Randomly select 10% of the sample neighborhood blocks from a total of 99,600 sample neighborhood blocks for training the deep network model. These sample neighborhood blocks are randomly sorted and packaged. The number of sample neighborhood blocks in the mini-batch is 512. Only one of the sample packages is used for each training. After the training, all 99,600 sample neighborhood blocks were input into the deep network model for testing. Finally, the classification results of all samples were obtained. The overall classification accuracy and average classification accuracy were used to evaluate the classification results. The overall classification result refers to the ratio of the number of correctly classified samples divided by the number of all samples. The average classification accuracy is first divided by the ratio of the number of correctly classified samples in each category by the number of samples of that category, and the average of the ratios of each category is calculated.

(4) Results of this example

The classification results obtained by using the remote sensing hyperspectral image and lidar image fusion classification method and device proposed by the present invention and the currently commonly used multi-stream encoder are shown in Table 1 below.

	总体分类精度Overall classification accuracy	平均分类精度average classification accuracy
本发明方法Method of the present invention	92.84％92.84%	90.74％90.74%
常用多流编码器Commonly used multi-stream encoders	86.23％86.23%	83.79％83.79%

It can be seen that the method of the present invention can better fuse and classify hyperspectral images and lidar images, with fewer misclassified samples. In addition, when the hyperspectral image decomposition part of the method of the present invention is removed and the above experiment is repeated, the overall classification accuracy obtained is 85.49%. It can be seen that the method of the present invention has strong information mining capabilities. To sum up, the method of the present invention can effectively improve the classification ability and classification accuracy of multi-source remote sensing images.

The following is an introduction to a remote sensing hyperspectral image and lidar image fusion classification device disclosed in the second embodiment of the present invention. The remote sensing hyperspectral image and lidar image fusion classification device described below is the same as the remote sensing hyperspectral image and lidar image fusion classification device described above. LiDAR image fusion classification methods can correspond to each other.

Please refer to Figure 2. An embodiment of the present invention provides a remote sensing hyperspectral image and lidar image fusion classification device, which includes:

The data acquisition module 10 is used to acquire hyperspectral images and lidar images. The categories of objects in the two images are label;

Image decomposition module 20 is used to decompose the hyperspectral image into intrinsic images to obtain intrinsic images and illumination images. For each hyperspectral intrinsic pixel, hyperspectral illumination pixel and lidar pixel, select its surrounding size The neighborhood of s×s is used as the neighborhood block of the pixel, where the neighborhood block size of each hyperspectral pixel in the hyperspectral image is s×s×B, and the neighbor block of the lidar pixel in the lidar image L is The domain block size is s×s;

Deep network training module 30, which is used to train the deep network branches L ₁ , L ₂ , L ₃ , L ₄ , L ₅ and L ₆ using neighborhood blocks, where the inputs of L ₁ and L ₂ are hyperspectral intrinsic images. The hyperspectral intrinsic pixels of size s×s×B in , the inputs of L ₃ and L ₄ are the lidar pixels of size s×s×B in the lidar image, the inputs of L ₅ and L ₆ is a hyperspectral illumination pixel with size s×s×B in the hyperspectral illumination image. Its outputs are O ¹ , O ² , O ³ , O ⁴ , O ⁵ and O ⁶ respectively, and the sizes are all s×s× d;

The image splicing module 40 uses the splicing layer to splice the outputs of the deep network branches L ₁ , L ₂ , L ₃ , L ₄ , L ₅ and L ₆ in pairs to obtain O ¹² , O ³⁴ and O ⁵⁶ ;

The multi-modal fusion module 50 is used to input O ³⁴ and O ⁵⁶ to the first multi-modal group convolution layer to obtain the output O ³⁴⁵⁶ ₁ , and input O ³⁴ to the second multi-modal group convolution layer. , get the output O ³⁴ ₁ , input O ⁵⁶ to the third multi-modal grouping convolution layer, get the output O ⁵⁶ ₁ , input O ³⁴ ₁ , O ³⁴⁵⁶ ₁ and O ⁵⁶ ₁ to the fourth multi-modal grouping Convolution layer, get the output O ³⁴⁵⁶ ₂ , input O ³⁴ ₁ to the 5th multi-modal group convolution layer, get the output O ³⁴ ₂ , input O ⁵⁶ ₁ to the 6th multi-modal group convolution layer, Get the output O ⁵⁶ ₂ , input O ³⁴ ₂ , O ³⁴⁵⁶ ₂ and O ⁵⁶ ₂ to the 7th multimodal grouping convolution layer, get the output O ³⁴⁵⁶ ₃ , input O ³⁴ ₂ to the 8th multimodal grouping Convolution layer, get the output O ³⁴ ₃ , input O ⁵⁶ ₂ to the 9th multi-modal group convolution layer, get the output O ⁵⁶ ₃ , input O ³⁴ ₃ , O ³⁴⁵⁶ ₃ and O ⁵⁶ ₃ to the 10th Multi-modal grouped convolution layer, get the output O ³⁴⁵⁶ ₄ , input O ¹² and O ³⁴⁵⁶ ₁ to the 11th multi-modal group convolution layer, get the output O ¹² ₁ , input O ¹² ₁ and O ³⁴⁵⁶ ₂ to The 12th multi-modal grouped convolution layer obtains the output O ¹² ₂ . Input O ¹² ₂ and O ³⁴⁵⁶ ₃ to the 13th multi-modal grouped convolution layer to obtain the output O ¹² ₃ . The size is s×s ×d O ³⁴⁵⁶ ₄ and O ¹² ₃ are input into the two-dimensional average pooling layer with size s×s, and O ³⁴⁵⁶ ₅ and O ¹² ₄ with size 1×d are obtained;

The image classification module 60 inputs O ¹² ₄ and O ³⁴⁵⁶ ₅ into the splicing layer to obtain the output O ¹²³⁴⁵⁶ , whose size is 1×2d, and inputs O ¹²³⁴⁵⁶ into the fully connected layer to obtain the final output category:

In one embodiment of the present invention, the data acquisition module 10 includes a data preprocessing submodule, which is used to perform processing on the hyperspectral image and the lidar image after selecting the hyperspectral image and the lidar image. Normalization preprocessing.

In one embodiment of the present invention, the image decomposition module 20 performs intrinsic image decomposition on the hyperspectral image to obtain an intrinsic image and an illumination image, including:

D _i = [H ₁ ,..., Hi _-1 , _Hi+1 ,..., H _X×Y , I _B ]∈R ^{B×(B+X×Y-1)}

where I _B is the identity matrix of size B×B;

min||α _i || ₁ stH _i ＝D _i α _i

Among them, the shape of α _i is (B+X×Y-1)×1;

Transform the hyperspectral image H into a two-dimensional matrix, and perform logarithmic calculation to obtain log(flatten(H)). Calculate the matrix K=(I _B -1 _B 1 _B ^T /B)log(flatten(H))(I _X×Y -1 _X×Y 1 _X×Y ^T /(X _× Y)), where I _B and I Matrices, 1 _B and 1 _X×Y are all-1 vectors with dimensions B×1 and (X×Y)×1 respectively;

A remote sensing hyperspectral image and lidar image fusion classification device in this embodiment is used to implement the aforementioned remote sensing hyperspectral image and lidar image fusion classification method. Therefore, the specific implementation of the device can be seen in the preceding article. The image and lidar image fusion classification method are part of the embodiments. Therefore, its specific implementation can be referred to the description of the corresponding embodiments of each part, and will not be introduced here.

In addition, since the remote sensing hyperspectral image and lidar image fusion classification device of this embodiment is used to implement the aforementioned remote sensing hyperspectral image and lidar image fusion classification method, its function corresponds to the function of the above system, and is not discussed here. Again.

Those skilled in the art will understand that embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each process and/or block in the flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine, such that the instructions executed by the processor of the computer or other programmable data processing device produce a use A device for realizing the functions specified in one process or multiple processes of the flowchart and/or one block or multiple blocks of the block diagram.

These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions The device implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.

These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device. Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.

Obviously, the above-mentioned embodiments are only examples for clear explanation and are not intended to limit the implementation. For those of ordinary skill in the art, other changes or modifications may be made based on the above description. An exhaustive list of all implementations is neither necessary nor possible. The obvious changes or modifications derived therefrom are still within the protection scope of the present invention.

Claims

A method for fusion and classification of remote sensing hyperspectral images and lidar images, which is characterized by including:

S1: Acquire hyperspectral images and lidar images. The categories of objects in the two images are label;

S2: Decompose the hyperspectral image into intrinsic image to obtain the intrinsic image and illumination image. For each hyperspectral intrinsic pixel, hyperspectral illumination pixel and lidar pixel, select its surrounding neighbors with size s×s. The domain is used as the neighborhood block of the pixel, where the neighborhood block size of each hyperspectral pixel in the hyperspectral image is s×s×B, and the neighborhood block size of the lidar pixel in the lidar image L is s× s;

S3: Training deep network branches L 1 , L 2 , L 3 , L 4 , L 5 and L 6 using neighborhood blocks, where the inputs to L 1 and L 2 are hyperspectral eigenimages of size s × s ×B hyperspectral intrinsic pixels, the inputs of L 3 and L 4 are lidar pixels with size s×s×B in the lidar image, and the inputs of L5 and L6 are the size s in the hyperspectral illumination image. The hyperspectral illumination pixels of s×s×B have outputs of O 1 , O 2 , O 3 , O 4 , O 5 and O 6 respectively, and the sizes are all s×s×d;

S4: Use the splicing layer to splice the outputs of the deep network branches L 1 , L 2 , L 3 , L 4 , L 5 and L 6 in pairs to obtain O 12 , O 34 and O 56 ;

S5: Input O 34 and O 56 to the first multi-modal group convolution layer to obtain the output O 3456 1 . Input O 34 to the second multi-modal group convolution layer to obtain the output O 34 1 . O 56 is input to the third multi-modal group convolution layer, and the output O 56 1 is obtained. O 34 1 , O 3456 1 and O 56 1 are input to the fourth multi-modal group convolution layer, and the output O 3456 is obtained. 2. Input O 34 1 to the fifth multi-modal group convolution layer to obtain the output O 34 2 . Input O 56 1 to the sixth multi-modal group convolution layer to obtain the output O 56 2 . Put O 34 2 , O 3456 2 and O 56 2 are input to the 7th multi-modal group convolution layer, and the output O 3456 3 is obtained. O 34 2 is input to the 8th multi-modal group convolution layer, and the output O 34 is obtained. 3. Input O 56 2 to the 9th multi-modal group convolution layer to obtain the output O 56 3 . Input O 34 3 , O 3456 3 and O 56 3 to the 10th multi-modal group convolution layer. Get the output O 3456 4 , input O 12 and O 3456 1 to the 11th multimodal group convolution layer, get the output O 12 1 , input O 12 1 and O 3456 2 to the 12th multimodal group convolution layer Product layer, get the output O 12 2 , input O 12 2 and O 3456 3 to the 13th multi-modal group convolution layer, get the output O 12 3 , put the O 3456 4 and O of size s×s×d 12 3 inputs a two-dimensional average pooling layer with size s × s, and obtains O 3456 5 and O 12 4 with size 1 × d;

S6: Input O 12 4 and O 3456 5 into the splicing layer to obtain the output O 123456 , whose size is 1×2d. Input O 123456 into the fully connected layer to obtain the final output category:
The remote sensing hyperspectral image and lidar image fusion classification method as claimed in claim 1, characterized in that: in step S1, after selecting the hyperspectral image and the lidar image, the hyperspectral image and the lidar image are normalized. preprocessing.
The fusion classification method of remote sensing hyperspectral images and lidar images according to claim 1 or 2, characterized in that: in step S2, the method of decomposing the hyperspectral image into intrinsic images to obtain intrinsic images and illumination images includes:

S2.1: Calculate the matrix D i corresponding to each hyperspectral pixel H i , where 1≤i≤X×Y:

D i =[H 1 ,..., H i-1 , H i+1 ,..., H X×Y , I B ]∈R B×(B+C×Y-1)

where I B is the identity matrix of size B×B;

S2.2: Calculate the vector α i corresponding to each hyperspectral pixel H i based on the matrix D i :

min||α i || 1 stH i ＝D i α i

Among them, the shape of α i is (B+X×Y-1)×1;

S2.3: Construct a weight matrix W∈R (X×Y)×(X×Y) , and assign values to the elements W ij in the i-th row and j-th column in the weight matrix.
Calculate the matrix G=(I X×Y -W T )(I X×Y -W)+δI X × Y based on the weight matrix W, where I The identity matrix of , δ is a constant, and T is the transposed matrix;

S2.4: Transform the hyperspectral image H into a two-dimensional matrix, and perform logarithmic calculation to obtain log(flatten(H)). Calculate the matrix K=(I B -1 B 1 B T /B)log(flatten(H) ))(I X×Y -1 X×Y 1 X×Y T /(X × Y)), where I B and I The identity matrix of Y), 1 B and 1 X×Y are all-1 vectors with dimensions B×1 and (X×Y)×1 respectively;

S2.5: Calculate matrix ρ = δKG -1 based on matrix G and matrix K. Based on matrix ρ, obtain the intrinsic image RE = e ρ and illumination image SH = e log(H)-ρ obtained by decomposing the hyperspectral image H. Where e is a natural constant, and the dimensions of both images are X×Y×B.
The remote sensing hyperspectral image and lidar image fusion classification method according to claim 1, characterized in that: in step S3, the deep network branches L 1 , L 2 , L 3 , L 4 , L 5 and L 6 are all It includes multiple two-dimensional convolution layers. The number of convolution kernels in each two-dimensional convolution layer is d, the convolution kernel size is [3, 3], and the convolution kernel sliding steps are [1, 1]. Branches L 2 and L 3 share all weights, and branches L 4 and L 5 share all weights.
The remote sensing hyperspectral image and lidar image fusion classification method according to claim 1, characterized in that: the loss function when constructing the deep network branch training in step S3 is:
The remote sensing hyperspectral image and lidar image fusion classification method according to claim 1 or 4, characterized in that: in step S4, a splicing layer is used to combine the deep network branches L 1 , L 2 , L 3 , L 4 , The outputs of L 5 and L 6 are spliced in pairs, and the formulas for O 12 , O 34 and O 56 are obtained: O 12 =Concatenation (O 1 , O 2 ), O 34 =Concatenation (O 3 , O 4 ), O 56 = Concatenation (O 5 , O 6 ).
A device for fusion and classification of remote sensing hyperspectral images and lidar images, which is characterized by including:

The data acquisition module is used to acquire hyperspectral images and lidar images. The categories of objects in the two images are label;

Image decomposition module, which is used to decompose hyperspectral images into intrinsic images to obtain intrinsic images and illumination images. For each hyperspectral intrinsic pixel, hyperspectral illumination pixel and lidar pixel, select its surrounding size as The neighborhood of s × s is used as the neighborhood block of the pixel, where the neighborhood block size of each hyperspectral pixel in the hyperspectral image is s × s × B, and the neighborhood of the lidar pixel in the lidar image L is The block size is s×s;

A deep network training module that uses neighborhood blocks to train the deep network branches L 1 , L 2 , L 3 , L 4 , L 5 and L 6 , where the inputs to L 1 and L 2 are the dimensions in the hyperspectral eigenimage is a hyperspectral intrinsic pixel of s×s×B, the inputs of L 3 and L 4 are lidar pixels of size s×s×B in the lidar image, and the inputs of L 5 and L 6 are hyperspectral The hyperspectral illumination pixels in the illumination image are of size s×s×B, and their outputs are O 1 , O 2 , O 3 , O 4 , O 5 and O 6 respectively, and the sizes are all s×s×d;

The image splicing module uses the splicing layer to splice the outputs of the deep network branches L 1 , L 2 , L 3 , L 4 , L 5 and L 6 in pairs to obtain O 12 , O 34 and O 56 ;

The multi-modal fusion module is used to input O 34 and O 56 to the first multi-modal group convolution layer to obtain the output O 3456 1 , and input O 34 to the second multi-modal group convolution layer, Get the output O 34 1 , input O 56 to the 3rd multi-modal group convolution layer, get the output O 56 1 , input O 34 1 , O 3456 1 and O 56 1 to the 4th multi-modal group convolution layer Accumulative layer, get the output O 3456 2 , input O 34 1 to the fifth multi-modal group convolution layer, get the output O 34 2 , input O 56 1 to the sixth multi-modal group convolution layer, get Output O 56 2 , input O 34 2 , O 3456 2 and O 56 2 to the 7th multimodal grouping convolution layer to obtain the output O 3456 3 , input O 34 2 to the 8th multimodal grouping volume Multimodal convolution layer, get the output O 34 3 , input O 56 2 to the 9th multi-modal group convolution layer, get the output O 56 3 , input O 34 3 , O 3456 3 and O 56 3 to the 10th multi-modal convolutional layer Modal group convolution layer, get the output O 3456 4 , input O 12 and O 3456 1 to the 11th multi-modal group convolution layer, get the output O 12 1 , input O 12 1 and O 3456 2 to the 11th multi-modal group convolution layer 12 multi-modal grouped convolution layers are used to obtain the output O 12 2 . Input O 12 2 and O 3456 3 to the 13th multi-modal grouped convolution layer to obtain the output O 12 3 . The size is s×s× O 3456 4 and O 12 3 of d are input into the two-dimensional average pooling layer with size s×s, and O 3456 5 and O 12 4 with size 1×d are obtained;

Image classification module, input O 12 4 and O 3456 5 into the splicing layer to get the output O 123456 , whose size is 1×2d, input O 123456 into the fully connected layer, and get the final output category:
The remote sensing hyperspectral image and lidar image fusion classification device according to claim 7, characterized in that: the data acquisition module includes a data preprocessing submodule, and the data preprocessing submodule is used to select hyperspectral images and After the lidar image, the hyperspectral image and the lidar image are normalized and preprocessed.
The remote sensing hyperspectral image and lidar image fusion classification device according to claim 7 or 8, characterized in that: the image decomposition module performs intrinsic image decomposition on the hyperspectral image to obtain the intrinsic image and illumination image, including:

Calculate the matrix D i corresponding to each hyperspectral pixel H i , where 1≤i≤X×Y:

D i = [H 1 ,..., Hi -1 , Hi+1 ,..., H X×Y , I B ]∈R B×(B+X×Y-1)

where I B is the identity matrix of size B×B;

Calculate the vector α i corresponding to each hyperspectral pixel H i based on the matrix D i :

min||α i || 1 stH i ＝D i α i

Among them, the shape of α i is (B+X×Y-1)×1;

Construct a weight matrix W∈R (X×Y)×(X×Y) , and assign a value to the element W ij in the i-th row and j-th column in the weight matrix
Calculate the matrix G=(I X×Y -W T )(I X×Y -W)+δI X × Y based on the weight matrix W, where I The identity matrix of , δ is a constant, and T is the transposed matrix;

Transform the hyperspectral image H into a two-dimensional matrix, and perform logarithmic calculation to obtain log(flatten(H)). Calculate the matrix K=(I B -1 B 1 B T /B) log(flatten(H))(I X×Y -1 X×Y 1 X×Y T /(X × Y)), where I B and I Matrices, 1 B and 1 X×Y are all-1 vectors with dimensions B×1 and (X×Y)×1 respectively;

The matrix ρ = δKG -1 is calculated based on the matrix G and the matrix K. Based on the matrix ρ, the intrinsic image RE = e ρ and the illumination image SH = e log(H)-ρ obtained by decomposing the hyperspectral image H are obtained, where e is the natural Constant, the dimensions of both images are X×Y×B.
The remote sensing hyperspectral image and lidar image fusion classification device according to claim 7, characterized in that: the deep network branches L 1 , L 2 , L 3 , L 4 , L 5 and L 6 each include multiple Two-dimensional convolution layer, the number of convolution kernels in each two-dimensional convolution layer is d, the convolution kernel size is [3, 3], the convolution kernel sliding step size is [1, 1], and the branch L 2 and L 3 share all weights, and branches L 4 and L 5 share all weights.