CN114241297A

CN114241297A - Remote sensing image classification method based on multi-scale pyramid space independent convolution

Info

Publication number: CN114241297A
Application number: CN202111369500.9A
Authority: CN
Inventors: 徐翰文; 艾波; 吕冠南; 尚恒帅; 冯文君
Original assignee: Qingdao Oceanread Information Service Co ltd; Shandong University of Science and Technology
Current assignee: Qingdao Oceanread Information Service Co ltd; Shandong University of Science and Technology
Priority date: 2021-11-16
Filing date: 2021-11-16
Publication date: 2022-03-25

Abstract

The invention relates to the technical field of remote sensing image classification, and relates to a remote sensing image classification method based on multi-scale pyramid space independent convolution, which comprises the steps of S1, preprocessing the remote sensing image, selecting a representative picture frame, and making a classification sample data set in a manual pixel-by-pixel labeling mode; s2, performing image enhancement on the classified sample data set obtained in the S1, wherein the image enhancement method comprises horizontal and vertical turning, random size resampling and Gaussian blur; s3, extracting the features of the image after image enhancement by using a depth residual error network; s4, further constructing a multi-scale feature pyramid based on the features of the image in the S3, and outputting multi-resolution multi-scale features; s5, designing space independent convolution, fusing the multi-scale features of S4 according to a decoupling principle and outputting fused features; and S6, mapping the fusion features of S5 to categories by utilizing a linear layer, and outputting a classification result. The invention realizes the dynamic fusion of multi-scale features, is simple and effective, has less calculation amount and further improves the classification precision.

Description

Remote sensing image classification method based on multi-scale pyramid space independent convolution

Technical Field

The invention relates to the technical field of remote sensing image classification, in particular to a remote sensing image classification method based on multi-scale pyramid space independent convolution.

Background

Remote sensing image classification is fundamental and key work for developing land utilization research, and is a basic problem to be solved firstly in land resource investigation and evaluation of different scales and different levels. Multi-scale features widely exist in various remote sensing images, such as large-scale buildings, farmlands, small-scale trees, vehicles, and the like, and the occupied area of the multi-scale features is greatly different from the features of the features. Therefore, the method has great practical significance for constructing the remote sensing classification model capable of identifying the multi-scale ground objects simultaneously.

The classification model is the core of remote sensing image classification, and at present, the mainstream classification model comprises a traditional machine learning and deep learning method. Due to the development of big data, parallel computing and neural networks, the deep learning method based on the Convolutional Neural Network (CNN) is rapidly developed, and great results are obtained, such as Full Convolutional Network (FCN), PSPNet, deep lab and the like. However, the recognition capability of the current CNN classification model mainly based on downsampling on multi-scale ground features is still poor, and the characteristic of small-scale ground features is damaged by large-scale downsampling, so that the CNN classification model is difficult to recognize, and the application of an actual scene is limited.

Disclosure of Invention

The invention provides a remote sensing image classification scheme based on multi-scale pyramid space independent convolution, aiming at the problem that the identification of multi-scale ground objects by a CNN-based remote sensing image classification model is not good enough at present, and the specific steps comprise:

a remote sensing image classification method based on multi-scale pyramid space independent convolution comprises the following steps:

s1, preprocessing the remote sensing image, selecting a representative picture to make a classification sample data set in a manual pixel-by-pixel labeling mode;

s2, performing image enhancement on the classified sample data set obtained in the S1, wherein the image enhancement method comprises horizontal and vertical turning, random size resampling and Gaussian blur;

s3, extracting the features of the image after image enhancement in S2 by using a depth residual error network;

s4, further constructing a multi-scale feature pyramid based on the features of the image in the S3, and outputting multi-resolution multi-scale features;

s5, designing space independent convolution, fusing the multi-scale features of S4 according to a decoupling principle and outputting fused features;

and S6, mapping the fusion features of S5 to categories by utilizing a linear layer, and outputting a classification result.

Preferably, the depth residual network in S3 is formed by stacking a plurality of convolution residual blocks, and the residual blocks divide the convolution into a direct mapping part and a residual part, specifically: x is the number of_i+1＝x_i+F_c(x_i，w_i)，x_iFor direct mapping of parts, f_c(x_i，w_i) Is a residual error part;

if x_iAnd x_i+1If the number of channels is different, 1 × 1 convolution pair x needs to be used_iDimension scaling is performed, where the residual block is represented as: x is the number of_i+1＝H(x_i，w_h)+F_c(x_i，w_i) And H represents a 1 × 1 convolution operation of adjusting the dimension.

Preferably, in S4, the feature pyramid is modified directly on the depth residual network, the feature map of each gray level and the feature map of the next two times of resolution scaling are added point by point, and finally, the scale features are aligned and output.

Preferably, in S5, let the output fusion feature be x ∈ R^(C×G)×H×WH and W represent the length and width of the multi-scale features, C is the channel number of each scale feature, G represents the number of different scales, and the space independent convolution is as follows:

representing the data represented by a convolution kernel W ∈ R^{(O×(C×G)×Ω×Ω)×(C×G)×K×K}The generated space independent convolution kernel, F represents the space independent convolution kernel

And generating a fusion feature.

Preferably, packet convolution is used to generate

And regarding the features from the same branch as a group, adding a Softmax function after grouping convolution, constraining the sum of the weights of the features with different resolutions to be 1, and performing improved space independent convolution as follows:

W∈R^(O ^{×G)×G×K×K}，

preferably, in S6, the linear layer is a fully connected layer, and the expression is: y is_i，j＝W^Tx_i，jAnd + b, performing calculation by taking the points of the fusion features output by the S5 as units, converting the dimensions of the features into the dimensions of the categories, and realizing the classification of the ground features.

Compared with the prior art, the invention has the beneficial effects that:

(1) the invention provides a multi-scale pyramid feature extraction method aiming at the problem of difficult multi-scale ground feature identification in the CNN-based remote sensing classification method, and can better give consideration to large and small ground features.

(2) The space independent convolution provided by the invention generates customized convolution kernels for different positions, and realizes the dynamic fusion of multi-scale features. Compared to standard convolution, the spatially independent convolution has the following advantages: 1) the characteristics of different scales are decoupled, so that the interpretability is stronger. 2) The method is simple and effective, the calculated amount is small, and the classification precision is further improved.

Drawings

FIG. 1 is an overall structure diagram of the remote sensing image classification method based on the multi-scale pyramid space independent convolution according to the invention;

FIG. 2 is a block diagram of the deep residual network of the present invention;

FIG. 3 is a block diagram of a multi-scale feature pyramid of the present invention;

FIG. 4 is a schematic diagram of the spatially independent convolution of the present invention;

FIG. 5 is a schematic illustration of a broad building category in an embodiment of the present invention;

fig. 6 is a schematic diagram of the wide-range wetland classification in the embodiment of the invention.

Detailed Description

The invention is described in detail below by way of exemplary embodiments. It should be understood, however, that elements, structures and features of one embodiment may be beneficially incorporated in other embodiments without further recitation.

The recognition capability of the CNN classification model based on downsampling is poor, for example, small ground objects are often too small or even disappear after being greatly downsampled, and the application of the classification model in real scenes is limited.

In order to solve the above problems, an embodiment of the present invention provides a remote sensing image classification method based on multi-scale pyramid space independent convolution. Constructing a characteristic pyramid on the basis of the depth residual error network, and outputting multi-scale characteristics; and the spatial independent convolution output is utilized to fuse the multi-scale features, so that the multi-scale ground object classification is realized.

Example (b): based on Qingdao regional high-resolution No. 2 remote sensing image, classification of coastal zone buildings and wetlands is realized. The method comprises the following specific steps:

s1, performing orthorectification and radiometric calibration on the high-resolution No. 2 multispectral image, performing orthorectification on the panchromatic image, then performing image fusion on the multispectral image and the panchromatic image, performing atmospheric correction on the fused image, and finally outputting the fused high-resolution multispectral image, wherein the resolution is about 1 meter. And then, cutting a representative picture on the fused high-resolution image according to the principles of full coverage of ground features and low redundancy, carrying out detailed pixel point level marking on the representative picture, and marking sample data to form a classification data set.

S2, before the training sample image obtained in S1 is read into the model, the training sample image is subjected to image enhancement to indirectly increase the number of samples. The enhancement method comprises horizontal and vertical flipping, random resampling of the size (0.5-2.0 times), gaussian blurring. After processing, the image rule is cut to 512 × 512 size to meet the input condition of the model.

And S3, extracting the features of the enhanced image in the step S2 by using a depth residual error network (ResNet). The depth residual network is stacked from multiple convolutional residual blocks, as shown in fig. 2. Let the current feature be x_iThe general convolution operation directly takes the convolved features as an output result, as shown in formula (1):

x_i+1＝F_c(x_i，w_i) (1)

in the formula F_cRepresenting a convolution operation, w_iAre convolution kernel parameters. While the residual block divides the convolution process into a direct mapped portion and a residual portion, e.g., a commonFormula (2):

x_i+1＝x_i+F_c(x_i，w_i) (2)

in the formula x_iFor direct mapping of parts, F_c(x_i，w_i) Is the residual part. Furthermore, if x_iAnd x_i+1If the number of channels is different, 1 × 1 convolution pair x needs to be used_iDimension scaling is performed, where the residual block is represented as:

x_i+1＝H(x_i，w_h)+F_c(x_i，w_i) (3)

where H represents a 1 × 1 convolution operation of the adjustment dimension.

Compared with the common convolution, the convolution residual block is easier to optimize, and the training speed and precision of the model can be improved; and the accuracy of the model can not be degraded as the model is deepened. In addition, the residual error mechanism is introduced, and the network is not added with extra calculation amount.

S4, further constructing a multi-scale feature pyramid based on the features of S3, and outputting multi-resolution multi-scale features. The characteristic pyramid is used for solving the problem that small ground objects cannot be identified due to the fact that the ResNet downsampling amplitude is too large, and therefore multi-scale ground object classification is achieved. The feature pyramid is directly modified on the ResNet, and the feature map of each resolution is subjected to point-by-point addition operation on the feature map of which the resolution is twice scaled later, and finally the feature of the alignment scale is aligned and output, as shown in fig. 3. Through the operation, the feature map of each scale is fused with the features of different scales and different semantic strengths, and the capability of identifying ground objects with corresponding resolution is ensured for each scale. And the pyramid method adds extra cross-layer connection only on the basis of ResNet, and hardly influences the efficiency of the model.

And S5, designing space independent convolution to fuse the multi-scale features of the S4 according to the decoupling principle and outputting the fused features. Let the multi-scale feature of the output be x ∈ R^(C×G)×H×WWherein H and W represent the length and width of the multi-scale features, C is the number of channels of each scale feature, and G represents the number of different scales. The aim of the invention is to design an adaptive fusion method to fuseThe multi-scale features generated by the pyramid trunk are aligned to allow feature points at different positions to automatically adjust weights at different scales, namely, the feature points have spatial independence. However, due to the translational invariance of the standard convolution layer, all positions share the same convolution kernel, and the weights of the convolution kernels are fixed after training, so that the efficacy of dynamically adjusting the weights of different resolutions according to the spatial positions cannot be realized. The standard convolution can be expressed as:

wherein W is ∈ R^{O×(C×G)×K×K}F is the feature generated after convolution for the convolution kernel parameter. The spatially independent convolution is designed to solve the above-mentioned problem of standard convolution, and its main idea is to use convolution to generate convolution kernel, and realize dynamically generating independent multi-scale feature fusion convolution kernel for each spatial position, as shown in formula (5):

in the formula

And generating a fusion feature. But generates a complete convolution kernel for each point

A large amount of memory is consumed. Thus, for aligned multi-scale features, the present invention makes an assumption to simplify the computation that features from the same resolution have the same importance and share the same weight. Based on this assumption the present invention uses packet convolution to generate

And the features from the same branch are regarded as a group, and a Sofimax function is added after the group convolution, so that the sum of the weights of the features with different scales is constrained to be 1, as shown in FIG. 4. The improved spatially independent convolution is shown in equation (6):

wherein W ∈ R^{(O×G)×G×K×K}，

Aligned multi-scale features generated by a fusion feature pyramid capable of self-adapting by utilizing space independent convolution are used for allowing feature points at different positions to automatically adjust weights of different scales, realizing decoupling of the multi-scale features and improving precision under the condition of higher interpretability.

And S6, mapping the fusion features of S5 to categories through a linear layer, and outputting classification results. The linear layer, i.e., the fully-connected layer, can be expressed as follows:

y_i，j＝W^Tx_i，j+b (7)

the operation is performed by taking the points of the fusion features output by S5 as units, and the dimensions of the features are converted into the dimensions of the categories, so that the ground feature classification is realized.

And S7, verifying the model precision. This example divides the data set into a training set and a test set on a 6 to 4 ratio, and tests the model accuracy on the test set with an Overall Accuracy (OA), F1 score, and an average iou (mlou). The OA, F1 score and mIoU are shown below:

wherein TP, TN, FP and FN represent true positive, true negative, false positive and false negative in the confusion matrix, N is the number of total pixel points, and PR and RE are the accuracy and recall calculated by the confusion matrix. The precision of the model is calculated by the test set as OA: 95.12, F1 score: 79.63, mIoU: 66.17.

and S8, classifying the real scene. The classification target selects buildings and wetlands in the high-resolution No. 2 image of the Qingdao region, the image is required to be cut according to the execution rule before the image is read into the model so as to meet the input condition, and the original image is spliced back according to the cutting rule after the image is classified by the model. Fig. 5 and 6 are graphs showing the results of the wide-range classification of buildings and wetlands.

According to the classification result, the method can adapt to the multi-scale ground objects with different sizes, and realize multi-scale ground object classification under the fixed condition of the model.

The above-mentioned embodiments are merely provided for the convenience of illustration of the present invention, and do not limit the scope of the present invention, and various simple modifications and modifications made by those skilled in the art within the technical scope of the present invention should be included in the above-mentioned claims.

It is to be understood that the above description is not intended to limit the present invention, and the present invention is not limited to the above examples, and those skilled in the art may make modifications, alterations, additions or substitutions within the spirit and scope of the present invention.

Claims

1. A remote sensing image classification method based on multi-scale pyramid space independent convolution is characterized by comprising the following steps:

2. The remote sensing image classification method based on the multi-scale pyramid space independent convolution of claim 1 is characterized in that a depth residual error network in S3 is formed by stacking a plurality of convolution residual error blocks, and the residual error blocks divide the convolution into a direct mapping part and a residual error part, specifically: x is the number of_i+1＝x_i+F_c(x_i，w_i)，x_iFor direct mapping of parts, f_c(x_i，w_i) Is a residual error part;

if x_iAnd x_i+1If the number of channels is different, then we need to use 1 × 1 convolution to perform dimension scaling on xi, and the residual block is represented as: x is the number of_i+1＝H(x_i，w_h)+f_c(x_i，w_i) And H represents a 1 × 1 convolution operation of adjusting the dimension.

3. The remote sensing image classification method based on the multi-scale pyramid space independent convolution of claim 1 is characterized in that in S4, the feature pyramid is directly modified on a depth residual error network, the feature graph of each gray scale and the feature graph of which the resolution is twice scaled are added point by point, and finally, scale features are aligned and output.

4. The remote sensing image classification method based on multi-scale pyramid space independent convolution of claim 1 is characterized in that in S5, the output fusion feature is set as x e R^(C×G)×H×WH and W represent the length and width of the multi-scale features, C is the channel number of each scale feature, G represents the number of different scales, and the space independent convolution is as follows:

And generating a fusion feature.

5. The method of classifying remote-sensing images based on multi-scale pyramid space-independent convolution of claim 4, characterized in that grouping convolution is used to generate

，W∈R^(O ^{×G)×G×K×K}，

6. the remote-sensing image classification based on multi-scale pyramid space-independent convolution of claim 1The method is characterized in that in the step S6, the linear layer is a fully connected layer, and the expression is as follows: y is_i，j＝W^Tx_i，jAnd + b, performing calculation by taking the points of the fusion features output by the S5 as units, converting the dimensions of the features into the dimensions of the categories, and realizing the classification of the ground features.