Disclosure of Invention
In order to solve the problems, the invention provides a hyperspectral image unmixing method and a hyperspectral image unmixing system based on a 3D convolutional neural network, and the hyperspectral image unmixing method based on the 3D convolutional neural network mainly comprises the following steps:
s101: constructing n training samples with the size of S multiplied by L, setting the size of abundance samples of end members in the training samples to be S multiplied by M, taking each pixel as a center of each training sample, extracting spatial characteristics of S1 multiplied by S1, and adding abundance values of the center pixels of each training sample to synthesize a training sample set; wherein S is the space size of the training sample, and L is the number of wave bands of the training sample; m is the ground object type of the training sample; s1 is the window size; the number of the wave bands of the training sample and the real image is equal;
s102: training the 3D convolutional neural network by using the training sample set obtained in the step S101 to obtain a trained 3D convolutional neural network;
s103: unmixing the real image to be unmixed by using the trained 3D convolutional neural network to obtain an abundance matrix of the real image; the abundance matrix is the unmixing result.
Further, in step S102, the 3D convolutional neural network has two 3D convolutional layers C1 and C2, and one fully connected layer F1; there are 2 convolution kernels in C1 and 4 convolution kernels in C2.
Further, in step S102, the 3D convolutional neural network is trained by using a softmax classifier, the loss of the 3D convolutional neural network is minimized by propagating the error back by using a gradient descent method, and the convolution kernel of the 3D convolutional neural network is updated by using an additional momentum method, as shown in formula (1) and formula (2):
wi+1=wi-εmi+1 (2)
in the above formula, i is greater than 0 and is the iteration number, m is the momentum, epsilon represents the learning rate, w represents the weight, and mc is the momentum factor.
Further, in step S103, the specific step of unmixing the real image by using the trained 3D convolutional neural network includes:
s201: inputting the real image as input data into the first 3D convolutional layer to obtain 2 three-dimensional data C1 out;
s202: processing the three-dimensional data C1out by using a ReLU function to obtain 2 processed three-dimensional data C1REout, and sending the 2 processed three-dimensional data C1REout serving as input to a second convolutional layer to obtain 4 three-dimensional data C2 out;
s203: processing the three-dimensional data C2out by using a ReLU function to obtain 4 processed three-dimensional data C2REout, and arranging the four three-dimensional data C2REout in a row in sequence to obtain 1 eigenvector;
s204: and feeding the feature vectors into a full-link layer F1 as input to obtain a corresponding abundance matrix.
Further, in step S103, reconstructing the image according to the obtained unmixing result to obtain a reconstructed image; then comparing the reconstructed image with a real image, and calculating a quantitative error value RMSE of the reconstructed image and the real image; judging the image unmixing effect of the current 3D convolutional neural network according to the quantitative error value RMSE; the smaller the RMSE is, the better the image unmixing effect is; the formula for RMSE is shown in equation (3):
in the above formula, y
iFor the ith picture element in the real image,
the image element is the ith image element of the reconstructed image, wherein i is 1,2, …, and n is the total number of the image elements of the reconstructed image; the total number of the image elements of the reconstructed image and the real image is equal.
Further, a hyperspectral image unmixing system based on 3D convolutional neural network, its characterized in that: the system comprises the following modules:
the training sample construction module is used for constructing n training samples with the size of S multiplied by L, setting the size of each end member abundance sample in the training samples to be S multiplied by M, taking each pixel of each training sample as a center, extracting the spatial characteristics of S1 multiplied by S1, and combining the abundance values of the central pixels of each training sample to synthesize a training sample set; wherein S is the space size of the training sample, and L is the number of the training sample wave bands; m is the ground object type of the training sample; s1 is the window size; the number of the wave bands of the training sample and the real image is equal;
a network training module, configured to train the 3D convolutional neural network by using the training sample set obtained in step S101, to obtain a trained 3D convolutional neural network;
the unmixing module is used for unmixing the real image to be unmixed by utilizing the trained 3D convolutional neural network to obtain an abundance matrix of the real image; the abundance matrix is the unmixing result.
The technical scheme provided by the invention has the beneficial effects that: the invention provides a novel hyperspectral image unmixing method, the 3D convolutional neural network provided by the invention is used for unmixing the hyperspectral image, the satisfactory result can be obtained almost without adjusting parameters, and compared with other methods, the technical scheme provided by the invention is simpler and more practical.
Detailed Description
For a more clear understanding of the technical features, objects and effects of the present invention, embodiments of the present invention will now be described in detail with reference to the accompanying drawings.
The embodiment of the invention provides a hyperspectral image unmixing method and a hyperspectral image unmixing system based on a 3D convolutional neural network.
Referring to fig. 1, fig. 1 is a flowchart of a hyperspectral image unmixing method based on a 3D convolutional neural network in an embodiment of the present invention, which specifically includes the following steps:
s101: constructing n training samples with the size of S multiplied by L, setting the size of abundance samples of end members in the training samples to be S multiplied by M, taking each pixel as a center of each training sample, extracting spatial characteristics of S1 multiplied by S1, and adding abundance values of the center pixels of each training sample to synthesize a training sample set; wherein S is the space size of the training sample, and L is the number of wave bands of the training sample; m is the ground object type of the training sample; s1 is the window size; the number of the wave bands of the training sample and the real image is equal;
s102: training the 3D convolutional neural network by using the training sample set obtained in the step S101 to obtain a trained 3D convolutional neural network;
s103: unmixing the real image to be unmixed by using the trained 3D convolutional neural network to obtain an abundance matrix of the real image; the abundance matrix is the unmixing result.
In step S102, the 3D convolutional neural network has two 3D convolutional layers C1 and C2, and a fully connected layer F1; there are 2 convolution kernels in C1 and 4 convolution kernels in C2.
In step S102, training the 3D convolutional neural network by using a softmax classifier, performing back propagation on the error by using a gradient descent method to minimize the loss of the 3D convolutional neural network, and updating the convolution kernel of the 3D convolutional neural network by using an additional momentum method, as shown in formula (1) and formula (2):
wi+1=wi-εmi+1 (2)
in the above formula, i is greater than 0 and is the iteration number, m is the momentum, epsilon represents the learning rate, w represents the weight, and mc is the momentum factor.
In step S103, the specific step of unmixing the real image by using the trained 3D convolutional neural network includes:
s201: inputting the real image as input data into the first 3D convolutional layer to obtain 2 three-dimensional data C1 out; the method comprises the following specific steps:
the C1 comprises 2 3D convolution kernels with the size of H1 xW 1 xD 1, and after each sample is respectively subjected to 3D convolution with the 2 convolution kernels, 2 pieces of three-dimensional data C1out with the size of (S2-H1+1) × (S2-W1+1) × (L-D1+1) are obtained; s2 is the real image space size; l is the number of spectral bands of the real image;
s202: processing the three-dimensional data C1out by using a ReLU function to obtain 2 processed three-dimensional data C1REout, and sending the 2 processed three-dimensional data C1REout serving as input to a second convolutional layer to obtain 4 three-dimensional data C2 out; the method comprises the following specific steps:
the second 3D convolutional layer C2 involved 4 3D convolutional kernels of size H2 × W2 × D2. After each input three-dimensional data is respectively subjected to 3D convolution with 4 convolution kernels, the results of the 4 three-dimensional data obtained by the same convolution kernel are accumulated and summed to obtain 4 three-dimensional data C2out with the size of (S2-H1-H2+2) × (S2-W1-W2+2) × (L-D1-D2+ 2);
s203: processing the three-dimensional data C2out by using a ReLU function to obtain 4 processed three-dimensional data C2REout, and arranging the four three-dimensional data C2REout into a row according to the sequence (the sequence is set according to the requirement), so as to obtain 1 (S2-H1-H2+2) × (S2-W1-W2+2) × (L-D1-D2+2) dimensional feature vector;
s204: and inputting the feature vectors into a full-link layer F1 as input to obtain a corresponding abundance matrix.
In step S103, reconstructing an image corresponding to the abundance matrix according to the obtained unmixing result to obtain a reconstructed image; then comparing the reconstructed image with a real image, and calculating a quantitative error value RMSE of the reconstructed image and the real image; judging the image unmixing effect of the current 3D convolutional neural network according to the quantitative error value RMSE; the smaller the RMSE is, the better the image unmixing effect is; the formula for RMSE is shown in equation (3):
in the above formula, y
iFor the ith picture element in the real image,
the image element is the ith image element of the reconstructed image, wherein i is 1,2, …, and n is the total number of the image elements of the reconstructed image; the total number of the image elements of the reconstructed image and the real image is equal.
Referring to fig. 2, fig. 2 is a schematic diagram illustrating a module composition of a hyperspectral image unmixing system based on a 3D convolutional neural network in an embodiment of the present invention, including a training sample construction module 11, a network training module 12, and an unmixing module 13, which are connected in sequence;
a training sample construction module 11, configured to construct n training samples with a size of sxsxsx L, set the size of each end member abundance sample in the training samples to sx M, extract spatial features of S1 × S1 for each training sample with each pixel as a center, and combine the abundance values of the center pixels of each training sample to form a training sample set; wherein S is the space size of the training sample, and L is the number of the training sample wave bands; m is the ground object type of the training sample; s1 is the window size; the number of the wave bands of the training sample and the real image is equal;
a network training module 12, configured to train the 3D convolutional neural network by using the training sample set obtained in step S101, so as to obtain a trained 3D convolutional neural network;
the unmixing module 13 is configured to unmix the real image to be unmixed by using the trained 3D convolutional neural network to obtain an abundance matrix of the real image; the abundance matrix is the unmixing result.
In the network training module 12, the 3D convolutional neural network has two 3D convolutional layers C1 and C2 and a fully connected layer F1; there are 2 convolution kernels in C1 and 4 convolution kernels in C2.
In this example, the actual data used was AVIRIS (aircraft visual not obtained imaging spectrometer) data imaged in 1995 in Cuprite, Nevada, USA, and experiments were carried out to demonstrate the effectiveness and creativity of the proposed technical method. In the area, minerals such as Alunite (Alunite), Kaolinite (Kaolinite), Chalcedony (Chalcedony), Muscovite (Muscovite), Montmorillonite (Montmorillonite), Jarosite (Jarosite), Calcite (Calcite) and the like are mainly included, and typical pure pixels exist; and small amount of mineral substances such as Buddingtonite, Nontronite, Halloysite, Dickite, etc. The group of data is widely applied to the research of the hyperspectral image end member extraction method and has good representativeness.
The spectral library used in the embodiment of the present invention is usgs (united States Geological survey) mineral spectral library M, which has a size of 224 × 498, where 224 is the number of wavelength bands of the spectrum, and 498 is the number of ground feature types, i.e. the number of end members.
The USGS spectral library and the Cuprice area hyperspectral image are removed from low signal-to-noise ratio and water vapor absorption wave bands, and 196 wave bands are left. For convenience of calculation, the image used in this experiment is a part of the entire image, and the size thereof is 143 × 123.
In the experiment, 11 end members contained in the end member set of the image are firstly determined, and corresponding 11 spectrums are extracted from the mineral spectrum library M of the USGS to form the experiment end member set. Thus, the end-member sample set size is 196 × 11. In practical situations, the mineral mixture generally does not include all end members, so when the hyperspectral image sample is synthesized, the mixture situation that the number of the end members is not more than 5 is mainly considered. Therefore, the sample randomly selects a number p as the number of end members, where p is a positive integer between 1 and 5. Abundance samples generated from p random end-members of the 11 end-members obey dirichlet distribution and meet the "sum is one" and "non-negative" constraints. In generating the training samples, spatially adjacent pixels are set to the same or similar abundance. And finally, synthesizing the hyperspectral image according to the linear model, and enabling the signal-to-noise ratio of the synthesized hyperspectral image to be 60 dB.
In the experiment, data of known ground object types need to be processed, so that the estimation of the number of end members by an algorithm is not needed. Because a general hyperspectral mixed image lacks the abundance of reference, when a network is trained, an abundance map sample of each end member needs to be constructed by self, and then a hyperspectral image sample is constructed by a known spectrum library.
The size of the hyperspectral image sample is set to 100 × 100 × 196, and the size of the abundance sample of each end member is 100 × 100 × 11. And (3) taking each pixel as a center to extract 5 multiplied by 196 characteristics of the hyperspectral image sample, and combining the unmixing results of the center pixels to form a training sample set. After the 3D convolutional neural network parameters are debugged by the samples, the samples are used for de-mixing real data, and an experimental flow chart is shown in FIG. 3.
The experimental results are shown in table 1, the algorithm for experimental comparison has FCLS and L1 norm sparse decomposition, and table 1 shows the reconstruction errors of the three algorithms on the hyperspectral data in the Cuprite region. Fig. 4 to 9 show the abundance results of typical figure Alunite (Alunite) and Chalcedony (Chalcedony) in 11 figures solved by 3 algorithms.
When the solution is solved by using the fully constrained least square method, the iteration is carried out for 2000 times.
When a sparse mixed pixel decomposition model based on L1 norm is used for solving, lambda is set to be 10-4, and the iteration number is 2000.
When the 3D convolutional neural network is used for training data, mini-batch is adopted, the batch number is 20, iteration is carried out for 2000 times, and the learning rate is 0.01.
Table 1 reconstruction errors of different algorithms on high spectral data of Cuprice area
|
FCLS
|
L1 sparse decomposition
|
3D-CNN
|
RMSE
|
0.0453
|
0.0427
|
0.0394 |
According to the comparison of the spectrum reconstruction errors RMSE of the three algorithms in the table 1, the unmixing result based on the 3D convolutional neural network is better than the other two algorithms. Since there is no standard abundance map to compare, it can only be roughly estimated visually and numerically. But overall the results of the 3D convolutional neural network are roughly satisfactory. The most important advantage is that few adjustments of the parameters are required to obtain good results.
The invention has the beneficial effects that: the invention provides a novel hyperspectral image unmixing method, the 3D convolutional neural network provided by the invention is used for unmixing the hyperspectral image, the satisfactory result can be obtained almost without adjusting parameters, and compared with other methods, the technical scheme provided by the invention is simpler and more practical.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.