CN110414494B

CN110414494B - SAR image classification method with ASPP deconvolution network

Info

Publication number: CN110414494B
Application number: CN201910829626.6A
Authority: CN
Inventors: 王英华; 刘睿; 刘宏伟; 张磊; 王聪; 贾少鹏; 秦庆喜
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2019-01-25
Filing date: 2019-09-03
Publication date: 2022-12-02
Anticipated expiration: 2039-09-03
Also published as: CN110414494A

Abstract

The invention discloses an SAR image classification method with an ASPP deconvolution network, which mainly solves the problems of low classification accuracy, low classification efficiency and complex operation in the SAR image classification process. The method comprises the following specific steps: reading a training image and a test image; (2) generating a training set and a test set; (3) Constructing a deconvolution network with void space pyramid pooling ASPP; (4) Training a deconvolution network with void space pyramid pooling ASPP; and (5) classifying the test images. The SAR image classification method based on the multi-scale object information solves the problems that multi-scale object information cannot be fully utilized and end-to-end classification cannot be realized in the existing SAR image classification algorithm, and has the advantages of high classification accuracy, high classification speed and simplicity in operation.

Description

SAR image classification method with ASPP deconvolution network

Technical Field

The invention belongs to the technical field of image processing, and further relates to a Synthetic Aperture Radar (SAR) image classification method with an empty space Pyramid Pooling (ASPP) deconvolution network in the technical field of image classification. The method can be used for classifying the types of the ground object targets in the synthetic aperture radar SAR image, such as a building area, a farmland area and an airport runway area.

Background

Synthetic aperture radar SAR imaging technology has been widely researched due to its characteristics of all-time, all-weather, strong penetrating power and the like. Therefore, SAR imaging technology is mature continuously, and the resolution of the obtained image is higher and higher. SAR image classification is an important branch of the radar image processing technology field. The SAR image classification is to realize classification of different ground object targets in the SAR image, such as a building area, a farmland area and an airport runway area, by using different texture features. The selected characteristics in SAR image classification determine the quality of classification results, and common texture characteristics comprise mean value, variance, entropy, energy, gray level co-occurrence matrix and the like. The deep learning method can automatically extract effective features, solves the problem that the features are difficult to select manually in the traditional method, and is widely applied to SAR image classification at present. Common deep learning classification methods include a classification method based on a convolutional neural network, a classification method based on a deep belief network, a classification method based on a deconvolution neural network, and the like.

The patent document of the university of sienna electronics technology (patent application number: 201810512092.X, application publication number: CN 108764330A) applied by the university of sienna electronics technology discloses an SAR image classification method based on superpixel segmentation and a convolution deconvolution network. The method comprises the steps of firstly sending SAR image data into a convolution deconvolution network to obtain an initial classification result of the image, and combining a super-pixel segmentation result of the SAR image data with the initial classification result to obtain a final classification result. The method utilizes the initial classification result of the super-pixel segmentation smoothing SAR image, overcomes the problem of long time consumption in classification in the SAR image classification method based on deep learning, and improves the classification precision of the SAR image. However, the method still has the defects that SAR image data is sent into a convolution deconvolution neural network, multi-scale target information of the SAR image is not fully utilized, and the SAR image classification accuracy rate is still not high enough.

The university of Hefei industries in the patent document "SAR image classification method based on texture features and DBN" (patent application No. 201710652513.4, application publication No. CN 107506699A) discloses an SAR image classification method based on texture features and DBN. The method mainly sends GLCM characteristics of a gray image block, GMRF characteristics of an original data image block and intensity information of the image into a depth confidence network DBN to obtain a classification result of the image. The method introduces texture features of the images to assist DBN network classification, overcomes the problem that an SAR image classification method based on deep learning only utilizes image intensity information, and improves the classification precision of the SAR image. However, the method still has the disadvantages that the texture features of the image are used for assisting the DBN network classification, the classification result of the SAR image to be classified is not directly obtained, end-to-end classification cannot be realized, operation is complex, the classification result of only one pixel point can be obtained in each test, engineering implementation is not facilitated, and the SAR image classification efficiency is low.

Disclosure of Invention

The invention aims to provide a SAR image classification method with an ASPP deconvolution network aiming at the defects of the prior art, and mainly solves the problems of low classification accuracy, complex operation and low classification efficiency in the SAR image classification process.

The idea for realizing the purpose of the invention is as follows: reading a training image and a test image, extracting a training data set from the training image by using a sliding window, performing data expansion to generate a training set, performing sliding from different initial positions of the test image by using the sliding window to generate a test set, constructing a deconvolution network with a cavity space pyramid-like ASPP, sending the training set into the deconvolution network to train until network parameters converge to obtain a trained deconvolution network, and respectively sending the test sets with different initial positions into the trained deconvolution network to classify to obtain a final classification result.

The method comprises the following specific steps:

(1) Reading training images and test images:

reading two SAR image data containing the same ground object target as a training image and a test image respectively, wherein any one of the two SAR image data contains at least three types of ground object targets, and the ratio of the number of the pixel points of one type with more pixel points and the number of the pixel points of one type with less pixel points in any two types of ground object targets in the training image is controlled within the range of 1 to 1.5;

(2) Generating a training set and a testing set:

(2a) Constructing a sliding window with the size of 90 multiplied by 90 pixels;

(2b) Placing a sliding window at the upper left corner of the training image, sequentially sliding from left to right and from top to bottom, wherein the sliding step length is 10 pixels, and taking a training image area overlapped with the sliding window in the sliding result as a training image pixel block;

(2c) Counting the number of each category in all pixel points in each training image pixel block, and taking the category of which the number is more than or equal to 0.5 time of the total number of the sliding window pixel points as the category of the pixel block to form a sample;

(2d) Forming a training data set of each class by using all samples of each class;

(2e) Randomly disordering the sequence of each type of training data set and randomly selecting samples with the same number to form an initial training sample set, wherein the number of the training samples selected in each type is not less than 2500;

(2f) Taking a central pixel point of each training sample in the initial training sample set as a center, and clockwise rotating by 90 degrees to obtain a first extended set;

(2g) Adding Gaussian noise with the mean value of 0 and the standard deviation of 0.1 to each training sample in the initial training sample set to obtain a second extended set;

(2h) Forming a training set by the initial training sample set, the first extended set and the second extended set;

(2i) Placing the sliding window at 5 different initial positions of the test image, sequentially sliding from left to right and from top to bottom, wherein the sliding step length is 90 pixels, and taking a test image area superposed with the sliding window in a sliding result as a test image pixel block to respectively form a first test set, a second test set, a third test set, a fourth test set and a fifth test set;

(3) Constructing a deconvolution network with void space pyramid pooling ASPP:

(3a) A deconvolution network is built, and the structure of the deconvolution network is as follows: input layer → convolutional layer 1 → pooling layer → nested module → splicing layer 1 → convolutional layer 2 → upsampling layer → output layer;

the nested module is formed by a first cavity space pyramid pooling ASPP and a branch 1 in parallel, and the structure of the branch 1 is sequentially convolutional layer 3 → nested submodule 1 → splicing layer 2 → convolutional layer 4 → deconvolution layer 1;

the nested submodule 1 is formed by a second cavity space pyramid pooling ASPP and a branch 2 in parallel; the structure of the branch 2 is sequentially convolutional layer 5 → nested submodule 2 → splicing layer 3 → convolutional layer 6 → reverse convolutional layer 2;

the nested submodule 2 is formed by a third cavity space pyramid pooling ASPP and a branch 3 in parallel; the structure of the branch 3 is sequentially convolutional layer 7 → fourth cavity space pyramid pooling ASPP → deconvolution layer 3;

(3b) Setting parameters of a deconvolution network;

(4) Training a deconvolution network with void space pyramid pooling ASPP:

inputting the training set into a deconvolution network with void space pyramid pooling ASPP, and training the deconvolution network with void space pyramid pooling ASPP until network parameters are converged to obtain a trained deconvolution network;

(5) Classifying the test images:

(5a) Sequentially and respectively inputting the first test set, the second test set, the third test set, the fourth test set and the fifth test set into a trained deconvolution network to obtain an initial classification pixel block of each test set;

(5b) And finding out the label with the maximum occurrence frequency of the corresponding position of each pixel point from all the initial classification pixel blocks as the category of the pixel point to obtain a final classification pixel block, and taking the final classification pixel block as the final classification result of the test image.

Compared with the prior art, the invention has the following advantages:

firstly, the method uses four void space pyramid pooling ASPP to respectively perform void space pyramid pooling operation on the feature map, so that the problem of low classification accuracy of an algorithm in the prior art due to underutilization of multi-scale target information is solved, and the method improves the accuracy of SAR image classification.

Secondly, the invention can obtain the classification result of one image pixel block at one time because the up-sampling layer is used for carrying out the up-sampling operation on the characteristic diagram, and overcomes the problem of low SAR image classification efficiency caused by that the algorithm in the prior art can only obtain the classification result of one pixel point in each test, so that the invention improves the classification efficiency in SAR image classification.

Thirdly, because the test set is directly input into the trained deconvolution network to obtain the initial classification pixel block of the test set, the invention overcomes the problem that the end-to-end classification can not be realized in the algorithm in the prior art, and ensures that the SAR image classification operation of the invention is simple.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a schematic diagram of a network architecture of the present invention;

FIG. 3 is an actual measured SAR image used in the simulation experiment of the present invention;

FIG. 4 is a true label graph of an actual measured SAR image used in a simulation experiment of the present invention;

FIG. 5 is a diagram of the classification results of the simulation experiment of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

The specific steps implemented by the present invention are further described with reference to fig. 1.

Step 1, reading a training image and a test image.

Reading two SAR image data containing the same ground object target as a training image and a test image respectively, wherein any one of the two SAR image data contains at least three types of ground object targets, and the ratio of the number of the pixel points of one type with more pixel points and the number of the pixel points of one type with less pixel points in any two types of ground object targets in the training image is controlled within the range of 1 to 1.5.

And 2, generating a training set and a testing set.

A sliding window of size 90 x 90 pixels is constructed.

And placing the sliding window at the upper left corner of the training image, sequentially sliding from left to right and from top to bottom, wherein the sliding step length is 10 pixels, and taking the training image area overlapped with the sliding window in the sliding result as a training image pixel block.

And counting the number of each category in all pixel points in each training image pixel block, and taking the category of which the number is more than or equal to 0.5 time of the total number of the sliding window pixel points as the category of the pixel block to form a sample.

All samples of each class are formed into a training data set for that class.

Randomly disordering the sequence of each type of training data set and randomly selecting the samples with the same number to form an initial training sample set, wherein the number of the training samples selected in each type is not less than 2500.

And taking the central pixel point of each training sample in the initial training sample set as a center, and clockwise rotating by 90 degrees to obtain a first extended set.

And adding Gaussian noise with the mean value of 0 and the standard deviation of 0.1 to each training sample in the initial training sample set to obtain a second extended set.

And forming a training set by the initial training sample set, the first extended set and the second extended set.

And placing the sliding window at 5 different initial positions of the test image, sequentially sliding from left to right and from top to bottom, wherein the sliding step length is 90 pixels, and taking a test image area superposed with the sliding window in a sliding result as a test image pixel block to respectively form a first test set, a second test set, a third test set, a fourth test set and a fifth test set.

The sliding window is placed at 5 different initial positions of the test image, namely, a test set obtained by taking the upper left corner of the test image as an initial position is taken as a first test set; taking the position of the upper left corner of the test image translated by 30 pixels to the right as an initial position to obtain a test set as a second test set; taking the position of the upper left corner of the test image translated by 60 pixels to the right as an initial position to obtain a test set as a third test set; taking the position of 30 pixels of downward translation at the upper left corner of the test image as an initial position to obtain a test set as a fourth test set; and taking the position of the upper left corner of the test image which is translated downwards by 60 pixels as a starting position to obtain a test set as a fifth test set.

And 3, constructing a deconvolution network with the void space pyramid-pooling ASPP.

The network architecture of the present invention is further described with reference to fig. 2.

A deconvolution network is built, and the structure of the deconvolution network is as follows: input layer → convolutional layer 1 → pooling layer → nested module → splicing layer 1 → convolutional layer 2 → upsampling layer → output layer.

The nested module is formed by a first cavity space pyramid pooling ASPP and a branch 1 in parallel, and the structure of the branch 1 sequentially comprises a convolution layer 3 → a nested sub-module 1 → a splicing layer 2 → a convolution layer 4 → an anti-convolution layer 1.

The nested submodule 1 is formed by a second cavity space pyramid pooling ASPP and a branch 2 in parallel; the structure of the branch 2 is sequentially convolutional layer 5 → nested submodule 2 → splicing layer 3 → convolutional layer 6 → reverse convolutional layer 2.

The nested submodule 2 is formed by a third cavity space pyramid pooling ASPP and a branch 3 in parallel; the structure of the branch 3 is convolution layer 7 → fourth cavity space pyramid pooling ASPP → deconvolution layer 3 in turn.

Each structure of the first, second, third, and fourth void space pyramid pooling ASPPs is in turn: the device comprises 4 parallel branches → a splicing layer → a convolutional layer, wherein the 1 st, the 2 nd and the 3 rd branches in the 4 parallel branches are all hollow convolutional layers, and the structure of the 4 th branch is a pooling layer → an upper sampling layer in sequence.

The parameters of the void space pyramid pooling ASPP are set as follows:

the expansion rates of the hole convolution layers of the 1 st branch, the 2 nd branch and the 3 rd branch are respectively set to be 2 pixels, 4 pixels and 8 pixels, the size of each hole convolution kernel is 3 multiplied by 3, and each sliding step length is 1 pixel.

The pooling mode of the pooling layer of the 4 th branch is set to global average pooling.

The upsampling layer of the 4 th branch is set as bilinear interpolated upsampling.

The convolution kernel size of the convolutional layer is set to 1 × 1, with a sliding step of 1 pixel.

The splice layer is set to a matrix splicing function.

Parameters of the deconvolution network are set.

The parameters of the deconvolution network are set as follows:

the number of feature maps of convolutional layer 1 is set to 32, the convolutional kernel size is 5 × 5, and the sliding step is 1 pixel.

And setting the number of the feature maps of the convolution layer 2 as the total classification number of the ground object targets contained in the SAR image to be classified, wherein the size of the convolution kernel is 1 multiplied by 1, and the sliding step length is 1 pixel.

The number of feature maps of convolution layers 3, 5, and 7 is set to 64, 128, and 256, respectively, each convolution kernel has a size of 5 × 5, and each sliding step has 2 pixels.

The number of feature maps of the convolution layers 4 and 6 is set to 64 and 128, respectively, the size of each convolution kernel is 1 × 1, and each sliding step is 1 pixel.

The pooling mode of the pooling layer is set to be maximum pooling, the number of the feature maps is 32, the size of a pooling window is 2 x 2, and the sliding step length is 2 pixels.

The number of the feature maps of the deconvolution layers 1, 2 and 3 is set to 64, 128 and 256 respectively, the size of each convolution kernel is 5 × 5, and each sliding step is 2 pixels.

The number of feature maps of the first, second, third and fourth hole space pyramid pooling ASPP is set to be 32, 64, 128 and 256 respectively.

The splicing layers 1, 2 and 3 are respectively set as a matrix splicing function.

The upsampling layer is set to bilinear interpolated upsampling.

And 4, training a deconvolution network with the void space pyramid-pooling ASPP.

And inputting the training set into a deconvolution network with the void space pyramid pooling ASPP, and training the deconvolution network with the void space pyramid pooling ASPP until the network parameters are converged to obtain the trained deconvolution network.

And 5, classifying the test images.

And sequentially and respectively inputting the first test set, the second test set, the third test set, the fourth test set and the fifth test set into the trained deconvolution network to obtain an initial classification pixel block of each test set.

And finding out the label with the most occurrence frequency of the corresponding position of each pixel point from all the initial classification pixel blocks as the class of the pixel point to obtain a final classification pixel block, and taking the final classification pixel block as the final classification result of the test image.

The effects of the present invention can be specifically explained by the following simulation experiments.

1. Simulation experiment conditions are as follows:

the simulation experiment of the invention is carried out in a hardware environment with main frequency of 2.9GHz, intel Xeon (R) CPU E3-1535M v5 and internal memory of 31.1GB and a software environment with MATLAB R2017a and Python 2.7.

2. Simulation content and result analysis:

the simulation experiment of the invention is to respectively classify one test image in the measured SAR images by adopting the method and two prior arts (a support vector machine SVM classification method based on the MLPH characteristic of the multilevel local mode histogram and a classification method based on the superpixel segmentation and the convolution deconvolution network).

Fig. 3 is an actual measurement SAR image used in the simulation experiment of the present invention, wherein fig. 3 (a) is an actual measurement training image and fig. 3 (b) is an actual measurement testing image.

Fig. 4 is a real labeled graph of an actually measured SAR image used in a simulation experiment of the present invention, wherein fig. 4 (a) is a real labeled graph of a training image actually measured in fig. 3 (a), and fig. 4 (b) is a real labeled graph of a test image actually measured in fig. 3 (b).

FIG. 5 is a graph of classification results of a simulation experiment of the present invention, wherein FIG. 5 (a) is a graph of classification results of a test image actually measured in FIG. 3 (b) according to the present invention; FIG. 5 (b) is a diagram illustrating the classification result of the actual measurement of a test image in FIG. 3 (b) by using the SVM classification method based on MLPH characteristics in the prior art; fig. 5 (c) is a diagram of the classification result of the actually measured test image of fig. 3 (b) by using the classification method based on the superpixel segmentation and the convolution deconvolution network in the prior art. Comparing fig. 5 (a), fig. 5 (b) and fig. 5 (c), the boundary classification of the actually measured test image of fig. 3 (b) is smoother by the method of the present invention, and referring to the real labeled graph of the actually measured test image of fig. 4 (b), it can be seen that the classification result of fig. 5 (a) of the present invention is closer to the real labeled graph than the classification results of fig. 5 (b) and fig. 5 (c) of the two prior art, which shows that the classification of the present invention is more accurate and the classification effect is better.

Table 1 shows the comparison of performance parameters for classifying a test image actually measured in FIG. 3 (b) by using the method of the present invention and the two prior art techniques, wherein F1 in the table represents the SVM classification method based on MLPH characteristics, F2 represents the classification method based on superpixel segmentation and convolution deconvolution network, and F3 represents the method of the present invention. The classification accuracy in the table is the ratio of the number of correctly classified pixels for classifying a test image actually measured in fig. 3 (b) to the total number of pixels in the test image, and correct classification means that the classification result of classifying the test image is consistent with the real mark corresponding to the test image before classification. The average classification accuracy is the average value of the classification accuracy of various ground object targets for classifying a test image actually measured in the step (b) of fig. 3; the run time is the run time used to classify a test image measured in fig. 3 (b).

TABLE 1 comparison table of performance parameters after test images are classified by three classification methods

As can be seen from the comparison table 1, compared with the two prior art, the method of the present invention has higher average classification accuracy and shortest running time for a test image actually measured in FIG. 3 (b). The invention improves the classification accuracy by extracting the multi-scale target information of the SAR image by using the four void space pyramid pooling ASPP, realizes end-to-end classification by using the deconvolution layer and the upsampling layer to obtain the classification result of the pixel block of the SAR image in each test, shortens the operation time and is convenient for engineering realization.

Claims

1. A SAR image classification method with an ASPP deconvolution network is characterized in that a training image and a test image are read, a training set and a test set are generated, a deconvolution network with a void space pyramid pooling ASPP is constructed, the deconvolution network with the void space pyramid pooling ASPP is trained, and the test image is classified, wherein the method comprises the following steps:

(1) Reading training images and test images:

(2) Generating a training set and a testing set:

(2a) Constructing a sliding window with the size of 90 multiplied by 90 pixels;

(2d) All samples of each category form a training data set of the category;

(3) Constructing a deconvolution network with void space pyramid pooling ASPP:

(3b) Setting parameters of a deconvolution network;

(4) Training a deconvolution network with a void space pyramid pooling ASPP:

(5) Classifying the test images:

(5b) And finding out the label with the most occurrence frequency of the corresponding position of each pixel point from all the initial classification pixel blocks as the class of the pixel point to obtain a final classification pixel block, and taking the final classification pixel block as the final classification result of the test image.

2. The SAR image classification method with the ASPP deconvolution network according to claim 1, characterized in that, the step (2 i) of placing the sliding window at 5 different starting positions of the test image means that a test set obtained by taking the upper left corner of the test image as a starting position is taken as a first test set; taking the position of the upper left corner of the test image translated by 30 pixels to the right as an initial position to obtain a test set as a second test set; taking the position of the upper left corner of the test image translated by 60 pixels to the right as an initial position to obtain a test set as a third test set; taking the position of 30 pixels of downward translation at the upper left corner of the test image as an initial position to obtain a test set as a fourth test set; and taking the position of the upper left corner of the test image which is translated downwards by 60 pixels as a starting position to obtain a test set as a fifth test set.

3. The method for classifying SAR images with ASPP deconvolution network according to claim 1, wherein each structure of the first, second, third, and fourth void space pyramid pooling ASPP in step (3 a) is in turn: the device comprises 4 parallel branches → splicing layers → convolutional layers, wherein the 1 st, the 2 nd and the 3 rd branches in the 4 parallel branches are all hollow convolutional layers, and the structure of the 4 th branch is a pooling layer → an upper sampling layer in sequence.

4. The SAR image classification method with the ASPP deconvolution network according to claim 3, characterized in that the parameters of the hole space pyramid pooling ASPP are set as follows:

setting the expansion rates of the cavity convolution layers of the 1 st branch, the 2 nd branch and the 3 rd branch as 2 pixels, 4 pixels and 8 pixels respectively, wherein the size of each cavity convolution kernel is 3 multiplied by 3, and each sliding step length is 1 pixel;

setting the pooling mode of the pooling layer of the 4 th branch as global average pooling;

setting an upsampling layer of a 4 th branch as bilinear interpolation upsampling;

setting the convolution kernel size of the convolution layer to be 1 multiplied by 1, and the sliding step length to be 1 pixel;

the splice layer is set to a matrix splicing function.

5. The method for classifying SAR images with an ASPP deconvolution network according to claim 1, wherein the parameters of the deconvolution network in step (3 b) are set as follows:

setting the number of the feature maps of the convolutional layer 1 as 32, the size of a convolutional kernel as 5 multiplied by 5, and the sliding step length as 1 pixel;

setting the number of the feature mapping maps of the convolution layer 2 as the total classification number of the ground object targets contained in the SAR image to be classified, wherein the size of a convolution kernel is 1 multiplied by 1, and the sliding step length is 1 pixel;

the number of the feature maps of the convolution layers 3, 5 and 7 is respectively set to be 64, 128 and 256, the size of each convolution kernel is 5 multiplied by 5, and each sliding step is 2 pixels;

the number of the feature maps of the convolutional layers 4 and 6 is respectively set to be 64 and 128, the size of each convolution kernel is 1 multiplied by 1, and each sliding step is 1 pixel;

setting the pooling mode of the pooling layer as maximum pooling, wherein the number of the feature maps is 32, the size of a pooling window is 2 multiplied by 2, and the sliding step length is 2 pixels;

the number of the feature maps of the deconvolution layers 1, 2 and 3 is respectively set to be 64, 128 and 256, the size of each convolution kernel is 5 multiplied by 5, and each sliding step is 2 pixels;

the number of feature maps of the first, second, third and fourth cavity space pyramid pooling ASPP is set to be 32, 64, 128 and 256 respectively;

setting splicing layers 1, 2 and 3 as a matrix splicing function respectively;

the upsampling layer is set to bilinear interpolated upsampling.