CN107273938B

CN107273938B - Multi-source remote sensing image ground object classification method based on two-channel convolution ladder network

Info

Publication number: CN107273938B
Application number: CN201710571057.0A
Authority: CN
Inventors: 焦李成; 屈嵘; 高倩; 马文萍; 杨淑媛; 侯彪; 刘芳; 尚荣华; 张向荣; 张丹; 唐旭; 马晶晶
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2017-07-13
Filing date: 2017-07-13
Publication date: 2020-05-29
Anticipated expiration: 2037-07-13
Also published as: CN107273938A

Abstract

The invention discloses a multisource remote sensing image ground object classification method based on a two-channel convolution ladder network, which is characterized in that multispectral data of an area to be classified, which are obtained by a landsat-8 sensor and a sentinel-2 sensor, are respectively normalized by using ENVI software to obtain normalized multispectral data; taking each element of the normalized data, and taking a peripheral 28 multiplied by 28 block to represent an original element value to form a characteristic matrix based on the image block; randomly selecting a plurality of blocks in each class to form a training data set L and S; constructing a multi-source remote sensing image ground object classification model based on a two-channel convolution ladder network; training a two-channel convolution ladder network multisource remote sensing image ground object classification model by using a training data set L and a training data set S; and classifying the test data set by using the trained two-channel convolution ladder network multi-source remote sensing image surface feature classification model. The method can obtain high multi-source image classification precision by using a small number of samples with class marks, and can be used for target detection.

Description

Multi-source remote sensing image ground object classification method based on two-channel convolution ladder network

Technical Field

The invention belongs to the field of image processing, and particularly relates to a multisource remote sensing image ground object classification method based on a two-channel convolution ladder network.

Background

With the development of remote sensing technology, more and more multispectral, multi-resolution and multi-temporal image data of the same region are acquired by various remote sensors, and abundant and precious data are provided for natural resource investigation, environmental monitoring and the like. However, the image data obtained by various single remote sensing means has obvious limitations and differences in geometric, spectral and spatial resolutions, so that the capacity of the image data for classification is limited. It is obviously important to combine their respective advantages and complements for classification. The information fusion technology is a technology for comprehensive processing of multi-element information, and can synthesize multi-source information to generate more accurate and more complete estimation and judgment compared with unit information. The information fusion technology can be divided into fusion of a data layer, a feature layer and a decision layer according to a fusion level. The feature level fusion is a higher-level fusion, and the fusion technology firstly extracts features of various data sources and then comprehensively analyzes and fuses the features. The common feature level fusion method comprises Bayes estimation, Dempster-Shafer evidence theory, cluster analysis, artificial neural network and the like. The neural network has the functions of distributed storage, parallel processing, self-learning, self-organization and the like of information, integrates and processes a high-dimensional feature space formed by multi-source information features, and can effectively fuse multi-dimensional information for classification and other problems.

When the artificial neural network is used for realizing fusion and classification, a supervised classification method is often used, a large amount of classified standard data is needed, the cost is high, and a large amount of manpower and financial resources are consumed.

Disclosure of Invention

The invention aims to overcome the defects and provides a multi-source remote sensing image ground object classification method based on a two-channel convolution ladder network.

In order to achieve the above object, the present invention comprises the steps of:

step one, performing normalization processing on multispectral data of a plurality of areas to be detected, which are obtained by a landsat-8 sensor, to obtain normalized multispectral data, and marking the normalized multispectral data as landsat _ A, landsat _ B, landsat _ C, … … and landsat _ N;

step two, the multispectral data of a plurality of regions to be detected obtained by the sentinel-2 sensor are normalized to obtain normalized multispectral data, and l is marked as sentinel _ A, sentinel _ B, sentinel _ C, … … and sentinel _ N;

thirdly, taking a peripheral 28 multiplied by 28 block in each element in the landsat _ A to represent an original element value to form a feature matrix L1_ A based on the image block, and similarly, obtaining feature matrices L1_ B, L1_ C, … … and L1_ N;

step four, each element in the sentinel _ A takes a peripheral 28 multiplied by 28 block to represent an original element value to form a characteristic matrix S1_ A based on the image block, and similarly, characteristic matrices S1_ B, S1_ C, … … and S1_ N are obtained;

step five, in the characteristic L1_ A, a plurality of blocks are randomly selected from each type to form a set L2_ A, similarly, sets L2_ B, L2_ C, … … and L2_ N are respectively obtained from L1_ B, L1_ C, … … and L1_ N, and a training data set L is formed from L2_ A, L2_ B, L2_ C, … … and L2_ N;

step six, in S1_ A, selecting blocks corresponding to the blocks in L2_ A to form a set S2_ A, and similarly, obtaining sets S2_ B, S2_ C, … … and S2_ N respectively from S1_ B, S1_ C, … … and S1_ N, and forming a training data set S from S2_ A, S2_ B, S2_ C, … … and S2_ N;

constructing a two-channel convolution ladder network multi-source remote sensing image ground object classification model;

step eight, training the multi-source image ground feature classification model by using a training data set L and a training data set S to obtain a trained model;

and step nine, classifying the test data sets L1_ A, L1_ B, L1_ C, … …, L1_ N, S1_ A, S1_ B, S1_ C, … … and S1_ N by using the trained models to obtain the output of the model corresponding to each pixel point in the test data sets.

In the first step and the second step, the ENVI software is used for carrying out normalization processing on the data, and the selected normalization mode is equal.

In the third step, the specific method for obtaining the feature matrices L1_ A, L1_ B, L1_ C, … … and L1_ N is as follows:

the normalized feature matrix landsat _ a is used to take a block of 28 × 28 pixels around each element to represent the original element value, so that the class label of the block is still the class label of the original element value, and since the feature matrix is 9-dimensional, the size of each block is 28 × 28 × 9, so that the feature matrix L1_ a based on the image block is formed, and similarly, the feature matrices L1_ A, L1_ B, L1_ C, … … and L1_ N are obtained.

In the fourth step, the specific method for obtaining the feature matrices S1_ A, S1_ B, S1_ C, … … and S1_ N is as follows:

the normalized feature matrix sentinel _ a is used to take a block of 28 × 28 pixels around each element to represent the original element value, so that the class label of the block is still the class label of the original element value, and since the feature matrix is 10-dimensional, the size of each block is 28 × 28 × 10, thereby forming the feature matrix S1_ a based on image blocks, and similarly, the feature matrices S1_ A, S1_ B, S1_ C, … … and S1_ N are obtained.

In the fifth step, a specific method for forming the training data set L is as follows:

in L1_ A, samples are randomly disturbed, the first 10% of blocks of each type are selected to form a set L2_ A, similarly, sets L2_ B, L2_ C, … … and L2_ N are respectively obtained from L1_ B, L1_ C, … … and L1_ N, and a training data set L is formed from L2_ A, L2_ B, L2_ C, … … and L2_ N.

In the sixth step, a specific method for forming the training data set S is as follows:

s1_ A is disordered in the same way as L1_ A, the first 10% of blocks of each type are selected to form a set S2_ A, similarly, sets S2_ B, S2_ C … … and S2_ N are obtained from S1_ B, S1_ C, … … and S1_ N respectively, and a training data set S is formed by S2_ A, S2_ B, S2_ C, S … … and S2_ N.

In the seventh step, a specific method for constructing a dual-channel convolution ladder network multisource remote sensing image surface feature classification model is as follows:

first, a network encoder is constructed:

the encoder based on convolution contains clean part and lossy part, and two parts share a set of parameter, and the gaussian noise that the average is 0 standard deviation and is 0.3 is added to each layer of lossy part, and the structure of encoder is: input layer → first convolution layer → second convolution layer → third convolution layer → fourth convolution layer → softmax classifier;

secondly, constructing a decoder:

the decoder reconstructs the lossy part of the encoder for each layer from output to input in turn, the reconstruction function is as follows:

represents the output of the ith neuron of the ith layer of the decoder,

i

1,2_lWherein m is_lRepresents the number of neurons of the l-th layer,

represents the output of the ith neuron of the l-th layer of the lossy part of the encoder,

to represent

The weight of (a) is determined,

it is indicated that a priori,

is a noise reduction function whichIn the formula, i and l are positive integers;

thirdly, constructing a loss function:

C_cis a cross-entropy loss function representing the loss of the supervised part, t (n) represents class labels,

is the output of the lossy part of the encoder, x₁(n) is the input of the first channel in the encoder, x₂(n) is the input of a second channel in the encoder, C_dDenotes the loss of the unsupervised part, λ^lRepresenting the weight of the l layers of reconstruction errors, N representing the number of samples input, m_lNumber of neurons in layer I, z^(l)Represents the output of the ith layer of the clean part of the encoder,

representing the output of the l-th layer reconstructed by the decoder.

In the step eight, a specific method for obtaining the trained model is as follows:

and respectively taking the training data set L and the training data set S as the input of a first channel and a second channel in the multi-source remote sensing image surface feature classification model, taking the category of each pixel point in the training data set as the output of the multi-source image surface feature classification model, and optimizing the network parameters of the classification model by solving the error between the category and the correct category marked manually and performing back propagation on the error to obtain the trained classification model.

In the ninth step, a specific method for obtaining the output of the model corresponding to each pixel point in the test data set is as follows:

the test data sets L1_ A, L1_ B, L1_ C, … … and L1_ N serve as input of a first channel of a trained multi-source remote sensing image ground feature classification model, S1_ A, S1_ B, S1_ C, … … and S1_ N serve as input of a second channel of the trained multi-source remote sensing image ground feature classification model, and output of the model is a classification category obtained by classifying each pixel point in the test data set of the region to be classified.

Compared with the prior art, the method has the advantages that the pixel-level features are expanded into the image block features, so that the spectral band information and the spatial information can be simultaneously obtained, and the method is more favorable for describing the features of different ground objects; according to the invention, the semi-supervised convolution ladder network applied to the natural image classification problem is applied to the multi-source remote sensing image ground feature classification, so that higher ground feature classification precision can be obtained under the condition of only using a small number of samples with class marks, and the consumption of manpower and financial resources is greatly reduced; compared with the traditional data fusion method, the method has the advantages that the fusion characteristics obtained by the method are more abstract and representative, the characteristics of different ground objects can be better described, and the classification precision of the multi-source remote sensing image is favorably improved; according to the method, the ground object target classification is realized by using the multi-source remote sensing data, the information complementation of different data is realized, and more complete characteristics of different ground objects are obtained.

Drawings

FIG. 1 is a flow chart of an implementation of the present invention;

FIG. 2 is a diagram of an artificial marker of an image to be detected according to the present invention; wherein a is Berlin; b is Paris; c is hong Kong; d is Rome; e is saint paul;

FIG. 3 is a diagram illustrating the classification result of an image to be classified according to the present invention; wherein a is Berlin; b is Paris; c is hong Kong; d is Rome; e is saint paul.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Referring to fig. 1, the method of the invention comprises the following steps:

step 1, respectively carrying out normalization processing on multispectral data of five cities, namely Berlin, Paris, hong Kong, Roman and St Paul, obtained by a landsat-8 sensor by using ENVI software to obtain normalized multispectral data which are respectively marked as landsat _ berlin, landsat _ paris, landsat _ hong _ kong, landsat _ color and landsat _ sao _ paulo;

multispectral data of five cities, namely Berlin, Paris, hong Kong, Roman and St.Paul, obtained by a Landsat-8 sensor are all 9 wave bands, and the image sizes are 666 multiplied by 643, 988 multiplied by 1160, 529 multiplied by 528, 447 multiplied by 377 and 871 multiplied by 1067 respectively;

when the ENVI software is used for carrying out normalization processing on the data, the selected normalization mode is equal;

step 2, performing normalization processing on multispectral data of five cities, namely Berlin, Paris, hong Kong, Roman and St.Paul, obtained by a sensor of sentinel-2 by using ENVI software respectively to obtain normalized multispectral data which are respectively marked as sentinel _ berlin, sentinel _ part, sentinel _ hong _ kong, sentinel _ color and sentinel _ sao _ paulo;

multispectral data of five cities of Berlin, Paris, hong Kong, Roman and St Paul obtained by a sentinel-2 sensor are all 10 wave bands, and the image sizes are 666 multiplied by 643, 988 multiplied by 1160, 529 multiplied by 528, 447 multiplied by 377 and 871 multiplied by 1067 respectively;

step 3, taking a peripheral 28 multiplied by 28 block from each element in the landsat _ berlin to represent an original element value, and forming a feature matrix L1_ berlin based on the image block, and similarly, obtaining feature matrices L1_ paris, L1_ hong _ kong, L1_ color and L1_ sao _ paulo;

taking a block with 28 × 28 pixels around each element in the normalized feature matrix landsat _ berlin to represent an original element value, so that the class label of the block is still the class label of the original element value, and since the feature matrix is 9-dimensional, the size of each block is 28 × 28 × 9, so that an image block-based feature matrix L1_ berlin is formed, and similarly, feature matrices L1_ paris, L1_ hong _ kong, L1_ rope, L1_ sao _ paulo are obtained;

step 4, taking a peripheral 28 multiplied by 28 block from each element in the sentinel _ berlin to represent an original element value, and forming a feature matrix S1_ berlin based on the image block, and similarly, obtaining feature matrices S1_ paris, S1_ hong _ kong, S1_ rome and S1_ sao _ paulo;

each element in the normalized feature matrix sentinel _ berlin is used for taking a block with 28 × 28 pixels around to represent the original element value, so that the class mark of the block is still the class mark of the original element value, and the feature matrix is 10-dimensional, so that the size of each block is 28 × 28 × 10, thereby forming an image block-based feature matrix S1_ berlin, and similarly, feature matrices S1_ paris, S1_ hong _ kong, S1_ rope, S1_ sao _ paulo are obtained;

step 5, in the L1_ berlin, a plurality of blocks are randomly selected from each type to form a set L2_ berlin, similarly, sets L2_ paris, L2_ hong _ kong, L2_ rome and L2_ sao _ paulo are respectively obtained from L1_ paris, L1_ hong _ kong, L1_ rome and L1_ sao _ paulo, and a training data set L is formed by L2_ berlin, L2_ paris, L2_ hong _ kong, L2_ rome and L2_ sao _ paulo;

in L1_ berlin, randomly scrambling samples, selecting the first 10% of blocks of each type to form a set L2_ berlin, and similarly, respectively obtaining sets L2_ paris, L2_ hong _ kong, L2_ rome and L2_ sao _ paulo from L1_ paris, L1_ hong _ kong, L2_ rome and L2_ sao _ paulo, and forming a training data set L from L2_ berlin, L2_ paris, L2_ hong _ kong, L2_ rome and L2_ sao _ paulo;

step 6, in the S1_ berlin, selecting blocks corresponding to the L2_ berlin to form a set S2_ berlin, and similarly, obtaining sets S2_ paris, S2_ hong _ kong, S2_ rome and S2_ sao _ paulo by S1_ paris, S1_ hong _ kong, S1_ rome and S1_ sao _ paulo respectively, and forming a training data set S by S2_ berlin, S2_ paris, S2_ hong _ kong, S2_ rome and S2_ sao _ paulo;

the S1_ berlin is disturbed in the same way as the L1_ berlin, the first 10% of blocks of each type are selected to form a set S2_ berlin, and similarly, the sets S2_ paris, S2_ hong _ kong, S2_ rome and S2_ sao _ paulo are obtained by S1_ paris, S1_ hong _ kong, S1_ rock and S1_ sao _ paulo respectively, and a training data set S is formed by S2_ berlin, S2_ paris, S2_ hong _ kong, S2_ rock and S2_ sao _ paulo;

step 7, constructing a dual-channel convolution ladder network multi-source remote sensing image ground feature classification model, comprising the following steps of:

(7a) constructing a network encoder:

the convolution-based encoder contains a clean part and a lossy part, both of which share a set of parameters, and the lossy part adds 0 standard deviation 0.3 gaussian noise to each layer. The structure of the encoder is as follows: input layer → first convolution layer → second convolution layer → third convolution layer → fourth convolution layer → softmax classifier, each layer parameter is as follows:

first layer input layer: setting the number of feature maps to be 9 for the first channel and setting the number of feature maps to be 10 for the second channel;

a second first winding layer: the number of feature maps of the first channel and the second channel is 100;

third layer second convolution layer: the number of feature maps of the first channel and the second channel is 100;

fourth layer third convolutional layer: the number of feature maps of the first channel and the second channel is 17;

fifth layer fourth buildup layer: setting the number of feature maps to be 17;

a sixth-level softmax classifier: setting the number of feature maps to be 17;

(7b) constructing a decoder:

denotes the i (i ═ 1, 2. -, m) th layer of the decoder_l) The output of each neuron, wherein m_lRepresents the number of neurons of the l-th layer,

to represent

The weight of (a) is determined,

it is indicated that a priori,

is a noise reduction function, where i, l are both positive integers;

(7c) constructing a loss function:

is the output of the lossy part of the encoder, x₁(n) is the input of the first channel in the encoder, x₂(n) is an input to a second channel in the encoder. C_dDenotes the loss of the unsupervised part, λ^lRepresenting the weight of the l layers of reconstruction errors, N representing the number of samples input, m_lNumber of neurons in layer I, z^(l)Represents the output of the ith layer of the clean part of the encoder,

representing the output of the l-th layer reconstructed by the decoder.

Step 8, training the dual-channel convolution ladder network multisource remote sensing image ground object classification model by using a training data set L and a training data set S to obtain a trained model;

respectively taking the training data sets L and S as the input of a first channel and a second channel in the dual-channel convolution ladder network multi-source remote sensing image ground feature classification model, taking the category of each pixel point in the training data sets as the output of the dual-channel convolution ladder network multi-source remote sensing image ground feature classification model, optimizing the network parameters of the classification model by solving the error between the category and the correct category of the artificial mark and performing back propagation on the error to obtain the trained classification model, wherein the correct category of the artificial mark is shown in figure 2.

And 9, classifying the test data sets L1_ berlin, L1_ paris, L1_ hong _ kong, L1_ round, L1_ sao _ paulo, S1_ bnerlin, S1_ paris, S1_ hong _ kong, S1_ round and S1_ sao _ paulo by using the trained model to obtain the output of the model corresponding to each pixel point in the test data sets.

The test data sets L1_ berlin, L1_ paris, L1_ hong _ kong, L1_ ring and L1_ sao _ paulo are used as the input of a first channel of a trained dual-channel convolution ladder network multi-source remote sensing image ground feature classification model, S1_ berlin, S1_ paris, S1_ hong _ kong, S1_ ring and S1_ sao _ paulo are used as the input of a second channel of the trained dual-channel convolution ladder network ground feature classification model, and the output of the model is a classification category obtained by classifying each pixel point in the test data sets of five cities.

The effect of the invention can be further illustrated by the following simulation experiment:

1. simulation conditions are as follows:

the hardware platform is as follows: HP-Z820.

The software platform is as follows: tensorflow.

2. Simulation content and results:

the experiment is carried out under the simulation condition by using the method, namely 10% of pixel points are randomly selected as training samples from multispectral data of five cities of Berlin, Paris, hong Kong, Roman and St Paul obtained from a landsat-8 sensor and a sentinel-2 sensor, and all the other marked pixel points are used as test samples to obtain the classification result shown in the figure 3. The data used in this experiment were classified into 16 types, which were: 1-dense high-rise buildings, 2-dense medium-rise buildings, 3-dense low-rise buildings, 4-open high-rise buildings, 5-open medium-rise buildings, 6-open low-rise buildings, 8-large low-rise buildings, 9-sparsely distributed buildings, 10-heavy industrial areas, 11-dense forests, 12-scattered trees, 13-shrubs and dwarf trees, 14-low vegetation, 15-bare rock, 16-bare soil and sandy soil, 17-water.

As can be seen from fig. 3: except for a few of wrongly-divided spots, each part has a good classified result of the classified mark area, the classified result of the water area is the best, the wrongly-divided spots exist in the classified result due to the similar characteristics of various buildings, and the classified mark areas of each part have good classified results.

The size of the training sample without the class marks is fixed to be 10 percent (about 8000 samples), the number of the samples with the class marks in the training sample is changed to make the samples with the class marks account for 10 percent, 5 percent and 3 percent of the total number, the classification precision of the test data set of the convolution ladder network with the single channel is compared, and the result is shown in a table 1:

TABLE 1

As can be seen from Table 1, when the training samples contain the class standard samples accounting for 10%, 5% and 3% of the total labeled samples, the classification accuracy of the test data set of each city is higher than the classification accuracy obtained by classifying the data of the landsat-8 or sentinel-2 sensor by using the single-channel convolution ladder network.

In conclusion, the multi-source remote sensing image is fused by using the two-channel convolution ladder network, so that classification is realized, more accurate and complete information is extracted, the expression capability of image characteristics is effectively improved, the generalization capability of a model is enhanced, and higher ground feature classification accuracy can be still achieved under the condition of few training samples.

Claims

1. The multi-source remote sensing image ground object classification method based on the two-channel convolution ladder network is characterized by comprising the following steps of:

step two, the multispectral data of a plurality of regions to be detected obtained by the sentinel-2 sensor are normalized to obtain normalized multispectral data which are marked as sentinel _ A, sentinel-B, sentinel _ C, … … and sentinel _ N;

step five, in a feature matrix L1_ A, randomly selecting a plurality of blocks of each type to form a set L2_ A, similarly, obtaining sets L2_ B, L2_ C, … … and L2_ N respectively from L1_ B, L1_ C, … … and L1_ N, and forming a training data set L from L2_ A, L2_ B, L2_ C, … … and L2_ N;

step six, selecting blocks corresponding to the blocks in the L2_ A from the feature matrix S1_ A to form a set S2_ A, and similarly, obtaining sets S2_ B, S2_ C, … … and S2_ N from S1_ B, S1_ C, … … and S1_ N respectively, and forming a training data set S from S2_ A, S2_ B, S2_ C, … … and S2_ N;

step seven, constructing a two-channel convolution ladder network multi-source remote sensing image ground object classification model, wherein the specific method comprises the following steps:

first, a network encoder is constructed:

secondly, constructing a decoder:

represents the output of the ith neuron of the ith layer of the decoder, i 1,2_lWherein m is_lRepresents the number of neurons of the l-th layer,

to represent

The weight of (a) is determined,

it is indicated that a priori,

is a noise reduction function, where i, l are both positive integers;

thirdly, constructing a loss function:

represents the output of the l-th layer reconstructed by the decoder;

2. The multi-source remote sensing image terrain classification method based on the two-channel convolution ladder network as claimed in claim 1, wherein in the first step and the second step, ENVI software is used for carrying out normalization processing on data, and the selected normalization mode is equal.

3. The method for classifying land features of multi-source remote sensing images based on the dual-channel convolution ladder network as claimed in claim 1, wherein in the third step, the specific method for obtaining feature matrices L1_ A, L1_ B, L1_ C, … … and L1_ N is as follows:

4. The method for classifying land features of multi-source remote sensing images based on the dual-channel convolution ladder network as claimed in claim 1, wherein in the fourth step, the specific method for obtaining feature matrices S1_ A, S1_ B, S1_ C, … … and S1_ N is as follows:

5. The multi-source remote sensing image surface feature classification method based on the two-channel convolution ladder network according to claim 1, characterized in that in the fifth step, a specific method for forming a training data set L is as follows:

6. The multi-source remote sensing image surface feature classification method based on the two-channel convolution ladder network according to claim 1 is characterized in that in the sixth step, a specific method for forming a training data set S is as follows:

7. The multi-source remote sensing image surface feature classification method based on the two-channel convolution ladder network as claimed in claim 1, wherein in the eighth step, a specific method for obtaining the trained model is as follows:

8. The multi-source remote sensing image ground object classification method based on the two-channel convolution ladder network according to claim 1 is characterized in that in the ninth step, a specific method for obtaining the output of the model corresponding to each pixel point in the test data set is as follows: